How to Use Puppeteer Cluster to Scale Up Web Scraping

Yuvraj Chandra
Yuvraj Chandra
May 30, 2025 · 5 min read

Scaling up a Puppeteer scraper is memory-intensive and can become almost impossible when running many browser instances on multiple dynamic pages. However, you can scale up by running Puppeteer browser workers concurrently using clustering.

In this article, we'll show you how Puppeteer clustering works and how to scale your scraper with the `puppeteer-cluster` library, including tips to improve scalability.

What Is Browser Clustering in Puppeteer?

Browser clustering in Puppeteer is a technique for managing multiple browser workers to handle requests concurrently. Each browser worker in a cluster independently picks up a URL from a shared queue, allowing tasks to run concurrently without waiting for others to finish.

When scraping with Puppeteer, clustering significantly speeds up the URL processing compared to using a single browser instance, page, or context.

Featured
How to do Web Scraping with Puppeteer and NodeJS
Create a powerful scraper with Puppeteer with this step-by-step tutorial and do headless browser web scraping.

For instance, if scraping 12 URLs, you can share them between 3 clustered browsers instead of spinning one instance per URL. In that case, the cluster coordinates the 3 browser instances as follows:

  • The 3 browser instances pick up 3 URLs from the queue simultaneously and run concurrently.
  • When one instance has finished processing its assigned URL, it returns to the queue to pick and scrape the next available URL.
  • Each instance in the cluster repeats this process until all 12 URLs are processed.

Clustering is handy for managing limited resources when scraping multiple pages with Puppeteer.

That said, clustering too many workers on a single node isn't recommended, as it can quickly consume system memory and reduce performance. For example, running 5 workers per cluster is more efficient than running 20 because fewer workers use less memory.

You can set up browser clusters in Puppeteer using the puppeteer-cluster library. It's a Puppeteer wrapper that enables you to queue and process URLs concurrently.

In the next section, you'll learn how to scale your Puppeteer scraper using puppeteer-cluster.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Concurrency Models in puppeteer-cluster

With puppeteer-cluster, you can run browser contexts, pages, or instances concurrently, depending on your project requirements.

Here are the three supported concurrency models in puppeteer-cluster, including their ideal use cases:

  • CONCURRENCY_BROWSER: Lets you run multiple browser instances concurrently. This option is ideal when you need to completely isolate browser processes, like User Agents, cookies, or proxies, during scraping. It's particularly useful for scraping several pages from the same domain, as that domain will treat each instance as separate traffic, especially if they use different IPs.
  • CONCURRENCY_CONTEXT: Runs multiple browser sessions concurrently under the same browser instance. Workers share browser resources like the User Agent and other fingerprint data points. While you can use it to scrape multiple URLs from the same domain, it increases the risk of detection since all contexts share the same browser fingerprint. However, assigning a different proxy to each context can help minimize the chances of detection.
  • CONCURRENCY_PAGE: Runs several pages concurrently within the same browser context, sharing cookies, session/local storage, and caches. This option is memory-efficient and ideal for scraping different domains, where shared browsing data isn't an issue. However, for the same domain, it increases the risk of detection as all pages share the same session data.

How to Use Puppeteer Cluster to Scrape at Scale

Let's now see how to scale scraping with puppeteer-cluster. We'll use the E-commerce Challenge page as the target website. Let's begin with the installation guide.

Step 1: Install Required Dependencies

puppeteer-cluster requires the standard Puppeteer. Install both libraries using npm:

Terminal
npm install puppeteer puppeteer-cluster

You're now ready to scale your Puppeteer scraping operations!

Step 2: Setting Up Puppeteer Cluster

To start, import the puppeteer-cluster library. Then, configure a cluster from the Cluster class. The following configuration runs 3 browser instances concurrently:

scraper.js
// npm install puppeteer puppeteer-cluster
const { Cluster } = require('puppeteer-cluster');

(async () => {
    // launch a cluster with Puppeteer
    const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_BROWSER,
        // allow up to 3 instances to run concurrently
        maxConcurrency: 3,
    });

    //  ...
})();

Your cluster of 3 browser instances is now ready. You'll see how to use it in the next step.

Step 3: Creating and Queuing Scraping Tasks

You'll now use the cluster to scrape multiple pages concurrently.

Specify an array of URLs to scrape and create a cluster task. Queue each URL as a task using cluster.queue. Finally, close the cluster to release system resources. Update the initial setup with these modifications, and you'll get the following complete code:

scraper.js
// npm install puppeteer puppeteer-cluster
const { Cluster } = require('puppeteer-cluster');

(async () => {
    const cluster = await Cluster.launch({
        // launch a cluster with Puppeteer
        concurrency: Cluster.CONCURRENCY_BROWSER,
        // allow up to 3 instances to run concurrently
        maxConcurrency: 3,
    });

    // specify the URL list
    const URLs = [
        'https://www.scrapingcourse.com/ecommerce/',
        'https://www.scrapingcourse.com/ecommerce/page/2/',
        'https://www.scrapingcourse.com/ecommerce/page/3/',
        'https://www.scrapingcourse.com/ecommerce/page/4/',
        'https://www.scrapingcourse.com/ecommerce/page/5/',
        'https://www.scrapingcourse.com/ecommerce/page/6/',
    ];

    // define the task
    await cluster.task(async ({ page, data: url }) => {
        await page.goto(url);
        console.log(await page.title());
    });

    // queue the array of URLs
    URLs.forEach((url) => cluster.queue(url));

    // wait for the cluster to finish
    await cluster.idle();
    // close the cluster
    await cluster.close();
})();

The above code returns the page titles, as shown. The output may be unordered since they run concurrently:

Output
Ecommerce Test Site to Learn Web Scraping - Page 2 - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 3 - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 5 - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 6 - ScrapingCourse.com
Ecommerce Test Site to Learn Web Scraping - Page 4 - ScrapingCourse.com

This approach works well when scraping similar data from related pages. To assign unique tasks to specific URLs, you'll need to define them individually in the queue.

Assume you want to scrape product information from the first page and the title from the second page. You'll define a separate scraping task for each URL and pass it as an option in the relevant queue:

scraper.js
// npm install puppeteer puppeteer-cluster
const { Cluster } = require('puppeteer-cluster');

(async () => {
    const cluster = await Cluster.launch({
        // launch a cluster with Puppeteer
        concurrency: Cluster.CONCURRENCY_BROWSER,
        // allow up to 3 instances to run concurrently
        maxConcurrency: 3,
    });

    // define the task to scrape product prices
    const scrapeProduct = async ({ page, data: url }) => {
        await page.goto(url);
        const products = await page.$$eval('.product', (productEls) => {
            return productEls.map((product) => {
                return {
                    name:
                        product
                            .querySelector('.product-name')
                            ?.textContent.trim() || 'No name',
                    price:
                        product.querySelector('.price')?.textContent.trim() ||
                        'No price',
                };
            });
        });
        console.log(products);
    };

    // define the task to scrape the title
    const scrapeTitle = async ({ page, data: url }) => {
        await page.goto(url);
        const title = await page.title();
        console.log(title);
    };

    // queue each task with its target URL
    cluster.queue('https://www.scrapingcourse.com/ecommerce/', scrapeProduct);
    cluster.queue(
        'https://www.scrapingcourse.com/ecommerce/page/2/',
        scrapeTitle
    );

    // wait for the cluster to finish
    await cluster.idle();
    // close the cluster
    await cluster.close();
})();

Great! You just implemented concurrent scraping in Puppeteer using puppeteer-cluster. Let's see more advanced features.

Output
[
  { name: 'Abominable Hoodie', price: '$69.00' },
  { name: 'Adrienne Trek Jacket', price: '$57.00' },
  // ...
  { name: 'Ariel Roll Sleeve Sweatshirt', price: '$39.00' },
  { name: 'Artemis Running Short', price: '$45.00' }
]
Ecommerce Test Site to Learn Web Scraping - Page 2 - ScrapingCourse.com

Advanced Cluster Management

puppeteer-cluster has advanced cluster management features that make it suitable for large-scale web scraping. These options are available in the cluster.Launch configuration:

  • retryLimit: The maximum number of retries allowed for each request in the cluster.
  • retryDelay: Sets the pause time between each retry in milliseconds.
  • skipDuplicateUrls: Automatically prevents visiting duplicate URLs. It defaults to false
  • timeout: Sets the general timeout for all tasks (in milliseconds).
  • workerCreationDelay: Specifies the pause time between each worker's creation (in milliseconds).

To add these options to your scraper, update the cluster configuration, as shown:

scraper.js
// ...
(async () => {
    const cluster = await Cluster.launch({
        // ...
        retryLimit: 3,
        retryDelay: 2000,
        skipDuplicateUrls: true,
        timeout: 6000,
        workerCreationDelay: 100,
    });
})();

While puppeteer-cluster helps manage limited resources, running clusters on a local machine or a single node isn't scalable for real-life scraping tasks. The next section shows you how to scale efficiently with Puppeteer browser clusters.

Scrape at Scale Using ZenRows' Scraping Browser

Scaling up scraping on a local machine can be challenging. Too many browser instances per cluster take up system memory. And even if you decide to switch to contexts or pages, opening too many at once freezes the browser, eventually reducing overall performance.

The best way to scale efficiently is to run your Puppeteer scraper on a cloud-based solution like the ZenRows Scraping Browser. It offers the following benefits:

✔️ The ZenRows Scraping Browser allows you to distribute thousands of scraping jobs across multiple cloud nodes.

✔️ It eliminates expensive, time-consuming infrastructure maintenance.

✔️ It supports between 20 and 150 concurrent requests, allowing you to scale effortlessly.

✔️ The Scraping Browser features rotating proxies to avoid anti-bot measures like rate-limiting and IP bans.

✔️ You also get a geo-targeting option to scrape beyond borders, regardless of location.

✔️ It offers a transparent pricing model you can predict as you scale.

Let's see how to integrate the scraping browser into your existing puppeteer-cluster scraper.

Sign up on ZenRows and go to the Scraping Browser Builder. Copy and paste the browser connection URL into your existing Puppeteer scraper.

ZenRows scraping browser
Click to open the image in full screen

Add puppeteer-core to your imports and specify the browser connection URL. Add the connection URL as the browser option in the Cluster.launch configuration. Here's the updated scraper with the ZenRows Scraping Browser:

scraper.js
// npm install puppeteer puppeteer-cluster
const { Cluster } = require('puppeteer-cluster');
const puppeteer = require('puppeteer-core');

(async () => {
    // set the browser endpoint
    const browser = await puppeteer.connect({
        browserWSEndpoint:
            'wss://browser.zenrows.com?apikey=<YOUR_ZENROWS_API_KEY>',
    });
    const cluster = await Cluster.launch({
        // launch a cluster with Puppeteer
        concurrency: Cluster.CONCURRENCY_BROWSER,
        // allow up to 3 instances to run concurrently
        maxConcurrency: 3,
        // add the browser endpoint
        browser,
    });

    // define the task to scrape product prices
    const scrapeProduct = async ({ page, data: url }) => {
        await page.goto(url);
        const products = await page.$$eval('.product', (productEls) => {
            return productEls.map((product) => {
                return {
                    name:
                        product
                            .querySelector('.product-name')
                            ?.textContent.trim() || 'No name',
                    price:
                        product.querySelector('.price')?.textContent.trim() ||
                        'No price',
                };
            });
        });
        console.log(products);
    };

    // define the task to scrape the title
    const scrapeTitle = async ({ page, data: url }) => {
        await page.goto(url);
        const title = await page.title();
        console.log(title);
    };

    // queue each task
    cluster.queue('https://www.scrapingcourse.com/ecommerce/', scrapeProduct);
    cluster.queue(
        'https://www.scrapingcourse.com/ecommerce/page/2/',
        scrapeTitle
    );

    // close the cluster
    await cluster.idle();
    await cluster.close();
    await browser.close();
})();

You'll get the following output on running this code:

Output
[
  { name: 'Abominable Hoodie', price: '$69.00' },
  { name: 'Adrienne Trek Jacket', price: '$57.00' },
  // ...
  { name: 'Ariel Roll Sleeve Sweatshirt', price: '$39.00' },
  { name: 'Artemis Running Short', price: '$45.00' }
]
Ecommerce Test Site to Learn Web Scraping - Page 2 - ScrapingCourse.com

Congratulations! 🎉 You've integrated the ZenRows Scraping Browser with puppeteer-cluster. Your Puppeteer scraper is now set for unlimited scalability.

Conclusion

You've learned how Puppeteer's Browser cluster works and how to use it to scale your scraping tasks.

Even with puppeteer-cluster, running several browser workers on a single node is unsustainable. To scale effortlessly with zero infrastructure setup, we recommend using the ZenRows Scraping Browser, a reliable cloud-based solution.

Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you