How to Scale Your Scraping Operations Using Puppeteer Pool

May 5, 2025 · 4 min read

Table of contents

Understanding browser pooling
How to use Puppeteer with generic pool to scrape at scale
- Install required dependencies
- Setting up a browser pool
- Creating scraping tasks
- Running the concurrent batch scraping job
Scrape at scale using ZenRows'
Conclusion

A single Puppeteer browser instance consumes between 200 and 300MB of memory. But the fact is, you can't do without sending multiple requests during web scraping. And that means running several browser instances, eventually overusing your system's RAM and reducing scraping performance.

A Puppeteer pool provides a batching structure that lets you scale, even amidst limited resources. In this article, we'll show you how to batch your Puppeteer scraping jobs concurrently using Node's generic-pool module.

Understanding Browser Pooling

Browser pooling means sharing a limited number of browser instances among multiple tasks. One primary reason for browser pooling is to manage system resources and prevent running multiple browser instances at once.

chrome instances browser pooling inforgraphic — Click to open the image in full screen

For instance, if scraping 20 pages with Puppeteer, you can use pooling to share the tasks among 5 browser instances instead of 20. In such cases, this is what happens behind the scenes:

Puppeteer launches 5 browser instances.
Each instance picks up a page at once from a list of 20 pages.
The 5 browser instances run concurrently, each handling one page at a time.
Again, the browser instances run the next 5 pages concurrently, and 10 are left.
The process goes on and on until the scraping job is complete.
After the whole process, the browser instances are closed to release system resources.

That said, pooling tools, like generic-pool, handle pooling and concurrency together by default. Once you specify a particular pool size, they ensure that each batch runs concurrently.

While browser pooling can improve your scraper's performance, inadequate batch setup or browser logic can result in issues like memory leaks and RAM overuse.

You can avoid these by configuring batch jobs to release the pool and close browser instances and pages after each task. You should also limit the pool size to prevent batches from overusing system memory.

Next, we'll show you how to run a Puppeteer pool using Node's generic-pool library.

Frustrated that your web scrapers are blocked once and again?

ZenRows API handles rotating proxies and headless browsers for you.

Try for FREE

How to Use Puppeteer With Generic Pool to Scrape at Scale

In this section, you'll use the generic-pool library to run batch Puppeteer scraping jobs. But first, let's install the required modules.

Step 1: Install Required Dependencies

Before we begin, install Puppeteer and the generic-pool module as shown:

                    Terminal
                
npm install puppeteer generic-pool

Copied!

All done? You're now ready to batch Puppeteer scraping jobs with pooling and concurrency.

Step 2: Setting Up a Browser Pool

A browser pool handles the configurations for managing reusable browser instances. It includes the launch and teardown sequence, maximum pool batch, and the minimum number of idle browsers.

To set up a browser pool for Puppeteer, import Puppeteer and generic-pool and define the configuration:

                    scraper.js
                
// npm install puppeteer generic-pool
const puppeteer = require('puppeteer');
const genericPool = require('generic-pool');

// create pool using generic-pool
const browserPool = genericPool.createPool(
    // browser startup and teardown rules
    {
        create: async () => await puppeteer.launch(),
        destroy: async (browser) => await browser.close(),
    },
    {
        max: 5,
        min: 1,
        idleTimeoutMillis: 30000,
    }
);

  
  

  
Copied!

Let's quickly see what each configuration does:

create: Instantiates a new browser instance in the pool.
destroy: Closes the browser instance after the batch job.
max: The maximum number of browser instances a concurrent batch can use. The above setting runs a maximum of 5 browser instances at once.
min: The number of warm but idle browsers available to take up new jobs when you submit a pool request. We've set this as 1 to avoid resource overuse in the background.
idleTimeoutMillis: Terminates the pool request if there's no response after 3 seconds.

Next, create a function to manage your scraping jobs. This function inherits its browser instance from the browser pool. It then releases the pool after running the tasks.

                    scraper.js
                
// ...
// run each scrape job using a pooled browser
const scrapePage = async (job) => {
    const browser = await browserPool.acquire();
    try {
        // ... instructions for running the scraping jobs
    } finally {
        await browserPool.release(browser);
    }
};

  
  

  
Copied!

We'll create the scraping jobs in the next section.

Step 3: Creating Scraping Tasks

The browser pool and the scraping task manager (scraperPage) are now ready. Our scraping tasks will target three different pages, and each will handle its own scraping logic:

To allow the scraperPage function to pull the scraping tasks effectively, define each in an object array. Each task independently accepts a browser parameter and returns a dictionary of the scraped data:

                    scraper.js
                
// ...

// define each target site's scraping logic in an object array
const scrapingJobs = [
    {
        url: 'https://www.scrapingcourse.com/ecommerce/',
        scrape: async (browser) => {
            const page = await browser.newPage();
            try {
                await page.goto('https://www.scrapingcourse.com/ecommerce/');
                await page.waitForSelector('.product');
                return await page.$$eval('.product', (productElements) => {
                    return productEls.map((product) => {
                        return {
                            name: product
                                .querySelector('.product-name')
                                .textContent.trim(),
                            price: product
                                .querySelector('.price')
                                .textContent.trim(),
                        };
                    });
                });
            } finally {
                await page.close();
            }
        },
    },
    {
        url: 'https://www.scrapingcourse.com/javascript-rendering',
        scrape: async (browser) => {
            const page = await browser.newPage();
            try {
                await page.goto(
                    'https://www.scrapingcourse.com/javascript-rendering'
                );
                await page.waitForSelector('.product-item');
                return await page.$$eval('.product-item', (productElements) => {
                    return productEls.map((product) => {
                        return {
                            name: product
                                .querySelector('.product-name')
                                .textContent.trim(),
                            price: product
                                .querySelector('.product-price')
                                .textContent.trim(),
                        };
                    });
                });
            } finally {
                await page.close();
            }
        },
    },
    {
        url: 'https://www.scrapingcourse.com/infinite-scrolling',
        scrape: async (browser) => {
            const page = await browser.newPage();
            try {
                await page.goto(
                    'https://www.scrapingcourse.com/infinite-scrolling'
                );
                await page.waitForSelector('.product-item');

                await page.evaluate(async () => {
                    for (let i = 0; i < 5; i++) {
                        window.scrollBy(0, window.innerHeight);
                        await new Promise((r) => setTimeout(r, 1000));
                    }
                });

                return await page.$$eval('.product-item', (productElements) => {
                    return productEls.map((product) => {
                        return {
                            name: product
                                .querySelector('.product-name')
                                .textContent.trim(),
                            price: product
                                .querySelector('.product-price')
                                .textContent.trim(),
                        };
                    });
                });
            } finally {
                await page.close();
            }
        },
    },
];

  
  

  
Copied!

Now, update the scrapePage function to execute a scraping task at a time from the scraping pool:

                    scraper.js
                
// ...

// run each scrape job using a pooled browser
const scrapePage = async (job) => {
    // ...
    try {
        // execute the scraping task from the job array
        return await job.scrape(browser);
    } finally {
        // ...
    }
};

  
  

  
Copied!

Step 4: Running the Concurrent Batch Scraping Job

The final phase is to assign each scraping job to a browser instance within the pool. Since the pool can manage a maximum of 5 browsers at once, 3 of them handle a page each:

Define a scrapeAll function that maps the scraping tasks into the scrapePage function and returns the scraped data as a promise. This function also returns the current URL from the array index so that you can identify the data belonging to each target site. It then ends by clearing the batch job to release system resources.

Finally, execute this function to run the batch job:

                    scraper.js
                
// ...

// define a function to handle concurrent batch jobs
const scrapeAll = async () => {
    try {
        // execute all scraping tasks
        const results = await Promise.all(scrapingJobs.map(scrapePage));
        results.forEach((data, index) => {
            // get the site URL for the scraped data
            console.log(`\n--- Results for: ${scrapingJobs[index].url} ---`);
            // return the data scraped from each page
            console.log(JSON.stringify(data, null, 2));
        });
    } catch (err) {
        console.error('Scraping failed:', err);
    } finally {
        // drain and clear the batch pool to release resources
        await browserPool.drain();
        await browserPool.clear();
    }
};

// execute the concurrent batch job
scrapeAll();

  
  

  
Copied!

Put all the snippets together, and you'll get a full code that looks like this:

                    scraper.js
                
// npm install puppeteer generic-pool
const puppeteer = require('puppeteer');
const genericPool = require('generic-pool');

// define each target site's scraping logic in an object array
const scrapingJobs = [
    {
        url: 'https://www.scrapingcourse.com/ecommerce/',
        scrape: async (browser) => {
            const page = await browser.newPage();
            try {
                await page.goto('https://www.scrapingcourse.com/ecommerce/');
                await page.waitForSelector('.product');
                return await page.$$eval('.product', (productEls) => {
                    return productEls.map((product) => {
                        return {
                            name: product
                                .querySelector('.product-name')
                                .textContent.trim(),
                            price: product
                                .querySelector('.price')
                                .textContent.trim(),
                        };
                    });
                });
            } finally {
                await page.close();
            }
        },
    },
    {
        url: 'https://www.scrapingcourse.com/javascript-rendering',
        scrape: async (browser) => {
            const page = await browser.newPage();
            try {
                await page.goto(
                    'https://www.scrapingcourse.com/javascript-rendering'
                );
                await page.waitForSelector('.product-item');
                return await page.$$eval('.product-item', (productEls) => {
                    return productEls.map((product) => {
                        return {
                            name: product
                                .querySelector('.product-name')
                                .textContent.trim(),
                            price: product
                                .querySelector('.product-price')
                                .textContent.trim(),
                        };
                    });
                });
            } finally {
                await page.close();
            }
        },
    },
    {
        url: 'https://www.scrapingcourse.com/infinite-scrolling',
        scrape: async (browser) => {
            const page = await browser.newPage();
            try {
                await page.goto(
                    'https://www.scrapingcourse.com/infinite-scrolling'
                );
                await page.waitForSelector('.product-item');

                await page.evaluate(async () => {
                    for (let i = 0; i < 5; i++) {
                        window.scrollBy(0, window.innerHeight);
                        await new Promise((r) => setTimeout(r, 1000));
                    }
                });

                return await page.$$eval('.product-item', (productEls) => {
                    return productEls.map((product) => {
                        return {
                            name: product
                                .querySelector('.product-name')
                                .textContent.trim(),
                            price: product
                                .querySelector('.product-price')
                                .textContent.trim(),
                        };
                    });
                });
            } finally {
                await page.close();
            }
        },
    },
];

// create pool using generic-pool
const browserPool = genericPool.createPool(
    // browser startup and teardown rules
    {
        create: async () => await puppeteer.launch(),
        destroy: async (browser) => await browser.close(),
    },
    {
        max: 5,
        min: 1,
        idleTimeoutMillis: 30000,
    }
);

// run each scrape job using a pooled browser
const scrapePage = async (job) => {
    const browser = await browserPool.acquire();
    try {
        // execute the scraping task from the job array
        return await job.scrape(browser);
    } finally {
        await browserPool.release(browser);
    }
};

// define a function to handle concurrent batch jobs
const scrapeAll = async () => {
    try {
        // execute all scraping tasks
        const results = await Promise.all(scrapingJobs.map(scrapePage));
        results.forEach((data, index) => {
            // get the site URL for the scraped data
            console.log(`\n--- Results for: ${scrapingJobs[index].url} ---`);
            // return the data scraped from each page
            console.log(JSON.stringify(data, null, 2));
        });
    } catch (err) {
        console.error('Scraping failed:', err);
    } finally {
        // drain and clear the batch pool to release resources
        await browserPool.drain();
        await browserPool.clear();
    }
};

// execute the concurrent batch job
scrapeAll();

  
  

  
Copied!

Running the above code returns the following, showing your scraper now runs a concurrent batch job for all tasks:

                    Output
                
--- Results for: https://www.scrapingcourse.com/ecommerce/ ---
[
  {
    "name": "Abominable Hoodie",
    "price": "$69.00"
  },

  // ... omitted for brevity,

  {
    "name": "Artemis Running Short",
    "price": "$45.00"
  }
]

--- Results for: https://www.scrapingcourse.com/button-click ---
[
  {
    "name": "Chaz Kangeroo Hoodie",
    "price": "$52"
  },

  // ... omitted for brevity,

  {
    "name": "Ajax Full-Zip Sweatshirt",
    "price": "$69"
  }
]

--- Results for: https://www.scrapingcourse.com/infinite-scrolling ---
[
  {
    "name": "Chaz Kangeroo Hoodie",
    "price": "$52"
  },

  // ... omitted for brevity,
 
  {
    "name": "Mars HeatTech&trade; Pullover",
    "price": "$66"
  }
]

  
  

  
Copied!

Great! You just implemented a Puppeteer pool that runs scraping jobs concurrently. Each scraping batch job now waits in a queue.

Scrape at Scale Using ZenRows' Scraping Browser

Scaling up can be challenging when it comes to automating the browser for scraping. Even if you use a Puppeteer browser pool, too many Chrome instances per batch (e.g., above 5) in local setups reduce performance, as only one node manages them.

You can handle this challenge by setting up browser instances in the cloud via the ZenRows Scraping Browser. It spins up browser instances in the cloud and distributes them across several nodes.

In addition to relieving your local machine's RAM, the Scraping Browser offers scalability. Depending on your plan, the Scraping Browser allows you to run between 20 and 150 concurrent browsers. It further distributes the tasks in each concurrent batch across several nodes, giving you low latency.

The ZenRows Scraping Browser also routes requests through rotating residential proxies, preventing scraping limitations such as IP bans and geo-restrictions. You can easily plug the Scraping Browser into your existing Puppeteer scraper.

Let's see how to implement it into the current Puppeteer pool scraper.

To get started, install puppeteer-core, a lightweight version of Puppeteer without browser binaries:

                    Terminal
                
npm install puppeteer-core

Copied!

Next, sign up on ZenRows and go to the Scraping Browser Builder. Then, copy and paste the browser connection URL into your existing Puppeteer scraper.

ZenRows scraping browser — Click to open the image in full screen

Replace the puppeteer import with puppeteer-core. Then, update the browser pool in your current scraper with the cloud browser instance connection and increase the maximum browser:

                    scraper.js
                
// npm install puppeteer-core generic-pool
const puppeteer = require('puppeteer-core');
const connectionURL = 'wss://browser.zenrows.com?apikey=<YOUR_ZENROWS_API_KEY>';
const genericPool = require('generic-pool');

// ... scraping tasks

// create pool using generic-pool
const browserPool = genericPool.createPool(
    // browser startup and teardown rules
    {
        create: async () =>
            await puppeteer.connect({ browserWSEndpoint: connectionURL }),
        destroy: async (browser) => await browser.close(),
    },
    {
        max: 15,
        min: 2,
        idleTimeoutMillis: 30000,
    }
);

// ...

  
  

  
Copied!

Congratulations! Your Puppeteer pool scraping requests now use cloud browser instances. You're on the right path to scalability.

Conclusion

You've learned to queue and execute batched scraping tasks concurrently using Puppeteer with the generic-pool module. Pooling Puppeteer requests is an important step towards scalability.

As earlier mentioned, managing a Puppeteer pool in a local environment isn't scalable since browser instances often overuse system memory. To scale reliably without any pressure on your local machine, we recommend using the ZenRows Scraping Browser.

Try ZenRows for free!