6 Tricks to Avoid Detection With Puppeteer

Idowu Omisola
Idowu Omisola
October 10, 2024 · 6 min read

Is your Puppeteer scraper frequently blocked by anti-bots? You're not alone. Bypassing anti-bots like Cloudflare is becoming more challenging as security measures evolve.

But no worries! This article covers the six best ways to avoid detection with Puppeteer while scraping, including a pro solution that guarantees high success. Let's dive in!

Can Puppeteer Be Detected by Anti-Bots?

The short answer is yes. Although Puppeteer is a JavaScript library that automates browser-user interactions, anti-bots often detect its automation properties, which usually results in blocking.

While controlling the browser, Puppeteer introduces automation-specific attributes, such as setting the navigator.webdriver property to true and using the HeadlessChrome flag in the User-Agent string (in headless mode). Unusual browser fingerprint elements, such as missing plugins and irregular rendering behaviors, also signal automation. 

Most anti-bot systems compare these automation properties against databases of allowed and disallowed characteristics to detect suspicious behavior. 

Puppeteer-controlled browsers, especially in their default configuration, often fall into the disallowed categories due to the previous bot-like properties. So, your Puppeteer web scraper has a higher chance of being flagged as a bot.

Let's prove the above point by opening this Cloudflare-protected Challenge page with Puppeteer. Install the Puppeteer if you've not done so already:

Terminal
npm install puppeteer

Now, try accessing the protected page with the following code that screenshots the web page:

Example
// npm install puppeteer
const puppeteer = require('puppeteer');

(async () => {
    // set up browser environment
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    // navigate to a URL
    await page.goto('https://www.scrapingcourse.com/cloudflare-challenge', {
        waitUntil: 'load',
    });

    // take the page screenshot
    await page.screenshot({ path: 'screenshot.png' });

    // close the browser instance
    await browser.close();
})();

We got the following screenshot showing that Puppeteer got blocked:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen

So, how can you mitigate such limitations and bypass anti-bot detection with Puppeteer? We'll show you 6 trusted tricks.

  • ZenRows - the ultimate solution.
  • Use proxies.
  • Use custom request headers.
  • Delay requests to mimic human behavior.
  • Block certain requests.
  • Puppeteer Stealth.

1. ZenRows — The Ultimate Solution

The easiest way to avoid anti-bot detection in Puppeteer is by using the ZenRows' Scraping Browser. It's a useful tool for bypassing anti-bots while scraping with browser automation libraries such as Puppeteer.

The ZenRows Scraping Browser fortifies your Puppeteer browser instance with advanced evasions to mimic an actual user and bypass anti-bot checks. These include fixing core fingerprinting issues, such as patching the navigator fields, replacing missing plugins like the PDF viewer, fixing WebGL and Canvas rendering, and more. 

The Scraping Browser runs your browser instance in the cloud, allowing you to scale efficiently without impacting your machine's memory. It also handles other tasks, such as residential proxy rotation under the hood, to distribute your requests efficiently and evade IP bans or geo-restrictions.

Integrating the Scraping Browser into your existing Puppeteer scraper requires only a single line of code.

Let's see how it works by requesting the protected website that previously blocked our Puppeteer scraper (the Cloudflare challenge page).

First, install puppeteer-core, a Puppeteer version that doesn't include pre-installed browser binaries:

Terminal
npm install puppeteer-core

Sign up to open the ZenRows Request Builder. Then, go to the Scraping Browser Builder and copy your browser URL:

ZenRows scraping browser
Click to open the image in full screen

Update the previous code by importing puppeteer-core and connecting Puppeteer through the browser URL, as shown:

Example
// npm install puppeteer-core
const puppeteer = require('puppeteer-core');

// define your connection URL
const connectionURL = 'wss://browser.zenrows.com?apikey=<YOUR_ZENROWS_API_KEY>';

(async () => {
    // set up browser environment
    const browser = await puppeteer.connect({
        browserWSEndpoint: connectionURL,
    });

    // create a new page
    const page = await browser.newPage();

    // navigate to a URL
    await page.goto('https://www.scrapingcourse.com/cloudflare-challenge', {
        waitUntil: 'networkidle0',
    });

    // wait for the challenge to resolve
    await new Promise(function (resolve) {
        setTimeout(resolve, 10000);
    });

    //take page screenshot
    await page.screenshot({ path: 'screenshot.png' });
    // close the browser instance
    await browser.close();
})();

The above code returns a screenshot of the protected website's homepage. See the result below:

cloudflare-challenge-passed
Click to open the image in full screen

Congratulations 🎉! You've bypassed anti-bot protection using a Puppeteer-ZenRows one-liner integration.

While this is the easiest way to bypass anti-bots with Puppeteer, you can explore other manual methods if you prefer a hands-on approach to setting things up. We'll show you how they work in the next sections.

2. Use Proxies

One of the most widely adopted anti-bot strategies is IP tracking, where the bot detection system is triggered when the IP exceeds a rate limit or the request comes from a blocked region.

To avoid detection, you can use a proxy in Puppeteer, which acts as a gateway between your scraper and the server. So when you send a request to the server, it's routed via the proxy, and then the response data is sent to you.

There are two proxy categories in terms of pricing: free and premium. 

While free proxies are cost-effective, they're public and have a short lifespan, making them unsuitable for serious scraping applications.

The best choice is premium residential proxies. These proxies efficiently distribute traffic across IPs assigned to everyday internet users by network providers, reducing the chances of triggering anti-bot systems. 

To add a free proxy to Puppeteer, include an args option containing the proxy details in the browser method. The following scraper uses a free proxy from the Free Proxy List and may not work at the time of reading. Feel free to grab a new one from that website:

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE
Example
// npm install puppeteer
const puppeteer = require('puppeteer');

(async () => {
    // define your proxy URL
    const proxy = 'http://178.128.113.118:23128';

    // set up the browser environment with the proxy URL
    const browser = await puppeteer.launch({
        args: [`--proxy-server=${proxy}`],
    });
    const page = await browser.newPage();

    // navigate to a URL
    await page.goto('https://httpbin.io/ip', {
        waitUntil: 'load',
    });

    //... your scraping logic

    // close the browser instance
    await browser.close();
})();

The above code now routes requests through the proxy's IP address.

Adding a premium proxy requires an extra step involving request interception with proxy credentials, such as username and password. Here's a sample code demonstrating how to set up an authenticated premium proxy in Puppeteer:

Example
// npm install puppeteer
const puppeteer = require('puppeteer');

(async () => {
    // define your proxy credentials
    const proxyURL = 'http://<PROXY_URL>:<PROXY_PORT>';
    const proxyUsername = '<PROXY_USERNAME>';
    const proxyPassword = '<PROXY_PASSWORD>';

    // set up the browser environment with the proxy URL
    const browser = await puppeteer.launch({
        args: [`--proxy-server=${proxyURL}`],
    });
    const page = await browser.newPage();

    // intercept the request with proxy credentials
    await page.authenticate({
        username: proxyUsername,
        password: proxyPassword,
    });

    // continue navigating to a URL
    await page.goto('https://httpbin.io/ip', {
        waitUntil: 'load',
    });

    //... your scraping logic

    // close the browser instance
    await browser.close();
})();

Most premium proxy providers also offer extra functionalities, such as geo-targeting, to access geo-blocked content. 

Check our article on the best web scraping proxies to learn more.

3. Use Custom Request Headers

Request headers contain context and metadata information about the HTTP request. So, it can hint the anti-bot whether a request originates from a bot or a regular browser. You can reduce detection risks by including appropriate headers in Puppeteer's HTTP requests.

Since Puppeteer works under HeadlessChrome by default, modifying it with custom headers like User-Agent and Referer makes the request more legitimate and fixes some fingerprinting issues. 

The User-Agent header identifies the client's application, operating system, and vendor, while the Referer header indicates the URL of the page from which the request originated.

There are several methods to add a request header in Puppeteer, but the easiest one is to add it while opening a new page. 

The code below modifies Puppeteer's User Agent and Referer headers and requests https://httpbin.io/headers, a test website that returns the request headers:

Example
// npm install puppeteer
const puppeteer = require('puppeteer');

// define your custom headers
const requestHeaders = {
    'user-agent':
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36',
    Referer: 'https://www.google.com/',
};

(async () => {
    // set up browser environment
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    // intercept the request with the custom headers
    await page.setExtraHTTPHeaders({ ...requestHeaders });

    // continue navigating to a URL
    await page.goto('https://httpbin.io/headers', {
        waitUntil: 'load',
    });

    // get the page content and output it
    const bodyContent = await page.$eval('pre', (element) => element.innerHTML);
    console.log(bodyContent);

    //... other scraping logic

    // close the browser instance
    await browser.close();
})();

The above code outputs the modified request headers as shown:

Output
{
    "headers": {
        // ... other headers omitted for brevity

        "Referer": ["https://www.google.com/"],
        "User-Agent": [
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36"
        ],
    }
}

There are more headers you can add to Puppeteer. Check our guide on the common web scraping request headers for more.

4. Block Certain Requests

Blocking certain requests, such as specific scripts, can help suppress resources that trigger browser fingerprinting. By suppressing fingerprinting, you can reduce the information that anti-bot systems can gather about your scraper.

While this approach optimizes performance and can reduce fingerprinting, there's no guarantee that the anti-bot won't detect you.

For instance, the Puppeteer scraper below blocks ads, analytics, and social media-embedded scripts using Puppeteer's built-in request interception:

Example
// npm install puppeteer
const puppeteer = require('puppeteer');

(async () => {
    // set up the browser environment
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    // enable request interception
    await page.setRequestInterception(true);

    // block non-essential third-party scripts
    page.on('request', (request) => {
        const url = request.url();

        // specify patterns for scripts you want to block
        if (
            url.includes('analytics') ||
            url.includes('ads') ||
            url.includes('social')
        ) {
            // block the request
            request.abort();
        } else {
            // allow the request
            request.continue();
        }
    });

    // navigate to the target page
    await page.goto('https://www.scrapingcourse.com/ecommerce/');

    //... your scraping logic

    // close the browser
    await browser.close();
})();

5. Delay Requests to Mimic Human Behavior

As previously discussed, an anti-bot can track a user's activity through the number of requests they send. Since real users don't send hundreds of requests per second, taking breaks between requests is a good way to simulate regular user behavior and avoid detection in Puppeteer.

When navigating multiple pages, consider setting intervals between requests or waiting a few moments before clicking navigation buttons to further mimic human patterns.

For example, the following code uses a custom getRandomDelay function to pause randomly between 1 and 5 seconds before clicking the next page button:

Example
// npm install puppeteer
const puppeteer = require('puppeteer');

// function to create a random delay
function getRandomDelay(min = 1000, max = 5000) {
    return Math.floor(Math.random() * (max - min + 1)) + min;
}

(async () => {
    // start Puppeteer in headless mode and open the target website
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    // navigate to the initial page
    await page.goto('https://www.scrapingcourse.com/ecommerce/');

    let hasNextPage = true;

    while (hasNextPage) {
        try {
            // wait for the "next" button
            const nextButton = await page.$('a.next');

            // check if the "next" button exists
            if (nextButton) {
                // introduce the random delay before clicking the next page
                const randomDelay = getRandomDelay();
                await new Promise(function (resolve) {
                    setTimeout(resolve, randomDelay);
                });

                // output the current wait time to confirm
                console.log(
                    `waiting for ${getRandomDelay()}ms before clicking the next page...`
                );

                // click the "next" button
                await nextButton.click();

                // wait for the page to load
                await page.waitForNavigation({ waitUntil: 'load' });
            } else {
                // if no next button, stop the loop
                hasNextPage = false;
                console.log('no more pages to navigate.');
            }
        } catch (error) {
            // if there's an error (like timeout), stop the loop
            console.log('error navigating to the next page:', error);
            hasNextPage = false;
        }
    }

    // close the browser
    await browser.close();
})();

The above code will apply the random wait time to Puppeteer's click action. Try executing the request in the GUI mode to see the process in action.

6. Puppeteer-Stealth

Puppeteer has many detectable bot-like properties by default. The Puppeteer Stealth Plugin is a modified version of Puppeteer that features various anti-bot evasions to reduce the chances of detection.

Since Puppeteer Stealth is a plugin, it doesn't change Puppeteer's standard API methods. It only patches bot-like properties to spoof an actual browser. These include changing the navigator.webdriver field to false, emulating navigator.mimeTypes in headless mode, mocking the headful browser runtime environment in headless mode, and more.

Let's see how Puppeteer Stealth works step-by-step by accessing CoinTracker, a website with simple Cloudflare protection.

The first step is to install the library:

Terminal
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Import these libraries and configure Puppeteer to use the Stealth plugin:

Example
// npm install puppeteer-extra puppeteer-extra-plugin-stealth
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// add the stealth plugin
puppeteer.use(StealthPlugin());

Next, launch the browser instance, visit the target web page and take a screenshot:

Example
// ...

(async () => {
    // set up browser environment
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    // navigate to a URL
    await page.goto('https://www.cointracker.io/', {
        waitUntil: 'load',
    });

    // take the page screenshot
    await page.screenshot({ path: 'screenshot.png' });

    // close the browser instance
    await browser.close();
})();

Here's the complete code after combining both snippets:

Example
// npm install puppeteer-extra puppeteer-extra-plugin-stealth
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// add the stealth plugin
puppeteer.use(StealthPlugin());

(async () => {
    // set up browser environment
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    // navigate to a URL
    await page.goto('https://www.cointracker.io/', {
        waitUntil: 'load',
    });

    // take the page screenshot
    await page.screenshot({ path: 'screenshot.png' });

    // close the browser instance
    await browser.close();
})();

Puppeteer Stealth successfully bypassed the anti-bot protection on CoinTracker, as shown in the following screenshot:

Coni Tracker Puppeteer Stealth Screenshot
Click to open the image in full screen

Congratulations!

Keep in mind that anti-bot security measures keep evolving and are becoming increasingly complex to bypass. Open-source bypass tools like Puppeteer Stealth often struggle to keep up with these frequent updates and still have detectable bot-like footprints. So, they're less reliable against advanced anti-bot measures, especially when scraping at scale.

For example, try to replace the above URL with the Cloudflare challenge page, and you'll see that it blocks Puppeteer Stealth:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen

Feel free to read our detailed guide on patching Puppeteer Stealth further to boost its anti-bot bypass capability.

Conclusion

There are different methods to avoid detection with Puppeteer, and we discussed the best and easiest ways to go about it in this article. You can use proxies, headers, delay requests or Puppeteer-Stealth to get the job done, but there are limitations. 

It's best to combine these techniques for the best result. However, a common limitation is that they're unreliable against advanced anti-bot measures. 

That said, the most straightforward and most recommended approach is integrating the ZenRows Scraping Browser to handle all the stealth tweaks for you while you focus on your scraping logic.

Try ZenRows for free without a credit card!

Ready to get started?

Up to 1,000 URLs for free are waiting for you