How to Set a Proxy In Puppeteer 2024

Yuvraj Chandra
Yuvraj Chandra
November 29, 2024 · 4 min read

Are you looking to set up a proxy with Puppeteer to bypass blocks and avoid IP bans? While Puppeteer excels at scraping dynamic websites, it's often targeted by anti-bot detection systems that can block your IP.

In this guide, we'll walk you through the steps to configure a proxy in Puppeteer, helping you stay undetected and scrape efficiently. Let's get started!

What Is a Proxy in Puppeteer?

A proxy works as an intermediary between a client and a server. A proxy in Puppeteer allows you to route your requests through a different IP address, masking your original IP. Proxies can help you bypass rate-limiting, geo-restrictions, or IP bans during web scraping by making your Puppeteer requests appear to originate from another location.

Whether you're setting up a free proxy or an authenticated proxy server, Puppeteer has built-in properties to achieve this. Keep reading to learn how to set up a proxy server while scraping with Puppeteer.

Setting Up a Proxy in Puppeteer

To set a proxy in Puppeteer, do the following:

  1. Get a valid proxy server URL.
  2. Configure your proxy settings using the --proxy-server Chrome flag.
  3. Connect to the target page. 

We'll use https://httpbin.io/ip, a site that returns your IP address, as the target website for this tutorial. Let's go through the entire procedure step by step. 

First, obtain the URL of a proxy server from the Free Proxy List. Configure Puppeteer to start Chrome with the --proxy-server option. Then, extract the text content of the target web page and print it as a JSON value:

scraper.js
// npm install puppeteer
const puppeteer = require('puppeteer');

const scraper = async () => {
    // free proxy server URL
    const proxyURL = 'http://160.86.242.23:8080';

    // launch a browser instance with the
    // --proxy-server flag enabled
    const browser = await puppeteer.launch({
        args: [`--proxy-server=${proxyURL}`],
    });
    // open a new page in the current browser context
    const page = await browser.newPage();

    // visit the target page
    await page.goto('https://httpbin.org/ip');

    // extract the IP the request comes from
    // and print it
    const body = await page.waitForSelector('body');
    const ip = await body.getProperty('textContent');
    console.log(await ip.jsonValue());

    await browser.close();
};
// execute the scraper function
scraper();

The Puppeteer Chrome instance will run all requests via the proxy server set in the flag. Here's the result:

Output
{
  "origin": "160.86.242.23"
}

That's the same IP as the proxy server, proving Puppeteer now visits the page through the specified proxy.

Fantastic! 🎉 You now know the basics of using a Puppeteer proxy. Let's dive into more advanced concepts!

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Puppeteer Proxy Authentication: Username and Password

Commercial and premium proxy services often require authentication to use their proxies. This ensures that only users with valid credentials can connect to their servers.

Here's an example of what an authenticated proxy URL that requires a username and password looks like:

Example
<PROXY_PROTOCOL>://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>

However, Chrome doesn't support the above syntax and ignores the username and password by default. To resolve this, Puppeteer introduced the authenticate() method, which accepts a pair of credentials and uses them to perform basic HTTP authentication:

Example
await page.authenticate({ username, password })

Use this method to handle proxy authentication in Puppeteer as follows:

scraper.js
// npm install puppeteer
const puppeteer = require('puppeteer');

const scraper = async () => {
    // authenticated proxy server info
    const proxyURL = 'http://138.91.159.185:8080';
    const proxyUsername = '<YOUR_USERNAME>';
    const proxyPassword = '<YOUR_PASSWORD>';

    // launch a browser instance with the
    // --proxy-server flag enabled
    const browser = await puppeteer.launch({
        args: [`--proxy-server=${proxyURL}`],
    });
    // open a new page in the current browser context
    const page = await browser.newPage();

    // specify the proxy credentials before
    // visiting the page
    await page.authenticate({
        username: proxyUsername,
        password: proxyPassword,
    });

    // visit the target page
    await page.goto('https://httpbin.org/ip');

    // extract the IP the request comes from
    // and print it
    const body = await page.waitForSelector('body');
    const ip = await body.getProperty('textContent');
    console.log(await ip.jsonValue());

    await browser.close();
};

scraper();

You just configured your Puppeteer scraper to use an authenticated proxy. Great job!

Rotating Proxies in Puppeteer

If you make too many requests quickly, the server may flag your script as a threat and ban your IP. You can prevent that by using proxy rotation, which automatically switches IPs for each request from multiple proxy servers. Rotating proxies helps you mimic different users and reduces the chances of anti-bot detection.

Let's learn how to implement proxy rotation and bypass anti-bot systems like Cloudflare in Puppeteer.

First, you need a list of proxies to choose from. In this example, we'll rely on a list of free proxies from the Free Proxy List as before. Use JavaScript's Math.random function to randomize the IP addresses in the list. This function ensures your Puppeteer scraper selects a random proxy address from the list per request. Finally, set the randomized proxy in the --proxy-server flag:

scraper.js
// npm install puppeteer
const puppeteer = require('puppeteer');

const scraper = async () => {
    // create a proxy list
    const proxies = [
        'http://160.86.242.23:8080',
        'http://200.60.145.167:8084',
        // ...,
        'http://188.166.229.121:80',
    ];

    // randomize the proxies per request
    const randomProxy = proxies[Math.floor(Math.random() * proxies.length)];

    // launch a browser instance with the
    // --proxy-server flag enabled
    const browser = await puppeteer.launch({
        args: [`--proxy-server=${randomProxy}`],
    });
    // open a new page in the current browser context
    const page = await browser.newPage();

    // visit the target page
    await page.goto('https://httpbin.org/ip');

    // extract the IP the request comes from
    // and print it
    const body = await page.waitForSelector('body');
    const ip = await body.getProperty('textContent');
    console.log(await ip.jsonValue());

    await browser.close();
};

scraper();

The above code switches your IP address randomly per request. The following is a sample result for three consecutive requests:

Output
// request 1
{
  "origin": "200.60.145.167"
}
// request 2
{
  "origin": "160.86.242.23"
}
// request 3
{
  "origin": "188.166.229.121"
}

Way to go! Your Puppeteer rotating proxy script is now ready.

However, the limitation of this approach is that you've rotated free proxies, which are only suitable for testing rather than real-life applications. Additionally, coding the rotation logic yourself is time-consuming and less accurate. The proxy list also becomes challenging to manage at scale, increasing the risk of IP bans. 

Fortunately, there is a more efficient alternative. Let's dive into it!

How to Choose the Best Proxies

Free proxies are generally shared. So, they have a short lifespan, low success rates, and poor IP reputations. Most websites you'll encounter will block them easily, limiting your scraping activities. 

The most reliable solution is to use paid or premium proxies, which are, fortunately, inexpensive. 

ZenRows is a top premium proxy provider with a vast proxy pool of 55M+ residential IPs across 185+ countries. It offers advanced functionalities, including proxy rotation to efficiently distribute your traffic across several locations and a flexible geo-location feature to access geo-restricted content at scale.

Let's see how it works! 

Sign up for free to open the ZenRows Request Builder. Go to Residential Proxies to open the Proxy Generator. Then, copy your proxy credentials (username and password).

generate residential proxies with zenrows
Click to open the image in full screen

Now, integrate ZenRows proxy into your Puppeteer script like so:

scraper.js
// npm install puppeteer
const puppeteer = require('puppeteer');

const scraper = async () => {
    // authenticated proxy server info
    const proxyURL = 'http://superproxy.zenrows.com:1337';
    const proxyUsername = '<ZENROWS_PROXY_USERNAME>';
    const proxyPassword = '<ZENROWS_PROXY_PASSWORD>';

    // launch a browser instance with the
    // --proxy-server flag enabled
    const browser = await puppeteer.launch({
        args: [`--proxy-server=${proxyURL}`],
    });
    // open a new page in the current browser context
    const page = await browser.newPage();

    // specify the proxy credentials before
    // visiting the page
    await page.authenticate({
        username: proxyUsername,
        password: proxyPassword,
    });

    // visit the target page
    await page.goto('https://httpbin.org/ip');

    // extract the IP the request comes from
    // and print it
    const body = await page.waitForSelector('body');
    const ip = await body.getProperty('textContent');
    console.log(await ip.jsonValue());

    await browser.close();
};

scraper();

Here's an example of what the output looks like:

Output
{
  "origin": "113.61.63.15"
}

Incredible! You now have a scraping proxy solution with Puppeteer's capabilities, and it's even more effective!

Troubleshooting Puppeteer Proxy Server Issues

While setting up a Puppeteer proxy, you might encounter errors due to misconfigurations, connectivity issues, an unreachable server, or an incorrect proxy address. If your proxy connection keeps failing, here's how to troubleshoot it.

  • Validate Proxy Setup: The first troubleshooting step is to run your Puppeteer scraper without a proxy configuration. If Puppeteer runs successfully without an error, your proxy server setup may be faulty. Verify your proxy address, and ensure the proxy option is passed correctly into the puppeteer.launch function. 
  • Verify Authentication Credentials: An HTTP error 407 while using an authenticated proxy specifically signals authentication issues. To resolve this issue, ensure you correctly input proxy credentials, such as username and password.
  • Confirm Proxy Accessibility: Use tools like cURL or telnet to test the proxy connection and confirm whether the server is available and accessible. 
  • Enable Detailed Debugging: Use verbose logging or run Puppeteer with devtools: true to capture detailed connection logs from the browser instance console.

Conclusion

This step-by-step tutorial explained how to configure a proxy in Puppeteer, from basic setup to proxy rotation, authentication, and troubleshooting common issues. Rotating proxies increases your chances of bypassing blocks by distributing traffic across several IPs. 

Remember that free proxies are unreliable. It's best to opt for premium proxies, such as the ZenRows residential proxies, which offer autorotation and geo-location features out of the box.

Try ZenRows for free today without a credit card!

Ready to get started?

Up to 1,000 URLs for free are waiting for you