Are you looking to set up a proxy with Puppeteer to bypass blocks and avoid IP bans? While Puppeteer excels at scraping dynamic websites, it's often targeted by anti-bot detection systems that can block your IP.
In this guide, we'll walk you through the steps to configure a proxy in Puppeteer, helping you stay undetected and scrape efficiently. Let's get started!
- Setting up a proxy in Puppeteer.
- Puppeteer proxy authentication: username and password.
- Rotating proxies in Puppeteer.
- How to choose the best proxy.
What Is a Proxy in Puppeteer?
A proxy works as an intermediary between a client and a server. A proxy in Puppeteer allows you to route your requests through a different IP address, masking your original IP. Proxies can help you bypass rate-limiting, geo-restrictions, or IP bans during web scraping by making your Puppeteer requests appear to originate from another location.
Whether you're setting up a free proxy or an authenticated proxy server, Puppeteer has built-in properties to achieve this. Keep reading to learn how to set up a proxy server while scraping with Puppeteer.
Setting Up a Proxy in Puppeteer
To set a proxy in Puppeteer, do the following:
- Get a valid proxy server URL.
- Configure your proxy settings using the
--proxy-server
Chrome flag. - Connect to the target page.Â
We'll use https://httpbin.io/ip
, a site that returns your IP address, as the target website for this tutorial. Let's go through the entire procedure step by step.Â
First, obtain the URL of a proxy server from the Free Proxy List. Configure Puppeteer to start Chrome with the --proxy-server
option. Then, extract the text content of the target web page and print it as a JSON value:
// npm install puppeteer
const puppeteer = require('puppeteer');
const scraper = async () => {
// free proxy server URL
const proxyURL = 'http://160.86.242.23:8080';
// launch a browser instance with the
// --proxy-server flag enabled
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxyURL}`],
});
// open a new page in the current browser context
const page = await browser.newPage();
// visit the target page
await page.goto('https://httpbin.org/ip');
// extract the IP the request comes from
// and print it
const body = await page.waitForSelector('body');
const ip = await body.getProperty('textContent');
console.log(await ip.jsonValue());
await browser.close();
};
// execute the scraper function
scraper();
The Puppeteer Chrome instance will run all requests via the proxy server set in the flag. Here's the result:
{
"origin": "160.86.242.23"
}
That's the same IP as the proxy server, proving Puppeteer now visits the page through the specified proxy.
Free proxies are unreliable and short-lived, so the one used in the snippet above is unlikely to work at the time of reading. Don't worry; we'll explore a better alternative later in the article.
Fantastic! 🎉 You now know the basics of using a Puppeteer proxy. Let's dive into more advanced concepts!
Puppeteer Proxy Authentication: Username and Password
Commercial and premium proxy services often require authentication to use their proxies. This ensures that only users with valid credentials can connect to their servers.
Here's an example of what an authenticated proxy URL that requires a username and password looks like:
<PROXY_PROTOCOL>://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>
However, Chrome doesn't support the above syntax and ignores the username and password by default. To resolve this, Puppeteer introduced the authenticate()
method, which accepts a pair of credentials and uses them to perform basic HTTP authentication:
await page.authenticate({ username, password })
Use this method to handle proxy authentication in Puppeteer as follows:
// npm install puppeteer
const puppeteer = require('puppeteer');
const scraper = async () => {
// authenticated proxy server info
const proxyURL = 'http://138.91.159.185:8080';
const proxyUsername = '<YOUR_USERNAME>';
const proxyPassword = '<YOUR_PASSWORD>';
// launch a browser instance with the
// --proxy-server flag enabled
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxyURL}`],
});
// open a new page in the current browser context
const page = await browser.newPage();
// specify the proxy credentials before
// visiting the page
await page.authenticate({
username: proxyUsername,
password: proxyPassword,
});
// visit the target page
await page.goto('https://httpbin.org/ip');
// extract the IP the request comes from
// and print it
const body = await page.waitForSelector('body');
const ip = await body.getProperty('textContent');
console.log(await ip.jsonValue());
await browser.close();
};
scraper();
If the credentials are wrong, proxy servers typically return a 407: Proxy Authentication Required
and the script may fail with ERR_HTTP_RESPONSE_CODE_FAILURE
. So, always ensure your username and password are valid.
You just configured your Puppeteer scraper to use an authenticated proxy. Great job!
Rotating Proxies in Puppeteer
If you make too many requests quickly, the server may flag your script as a threat and ban your IP. You can prevent that by using proxy rotation, which automatically switches IPs for each request from multiple proxy servers. Rotating proxies helps you mimic different users and reduces the chances of anti-bot detection.
Let's learn how to implement proxy rotation and bypass anti-bot systems like Cloudflare in Puppeteer.
First, you need a list of proxies to choose from. In this example, we'll rely on a list of free proxies from the Free Proxy List as before. Use JavaScript's Math.random
function to randomize the IP addresses in the list. This function ensures your Puppeteer scraper selects a random proxy address from the list per request. Finally, set the randomized proxy in the --proxy-server
flag:
// npm install puppeteer
const puppeteer = require('puppeteer');
const scraper = async () => {
// create a proxy list
const proxies = [
'http://160.86.242.23:8080',
'http://200.60.145.167:8084',
// ...,
'http://188.166.229.121:80',
];
// randomize the proxies per request
const randomProxy = proxies[Math.floor(Math.random() * proxies.length)];
// launch a browser instance with the
// --proxy-server flag enabled
const browser = await puppeteer.launch({
args: [`--proxy-server=${randomProxy}`],
});
// open a new page in the current browser context
const page = await browser.newPage();
// visit the target page
await page.goto('https://httpbin.org/ip');
// extract the IP the request comes from
// and print it
const body = await page.waitForSelector('body');
const ip = await body.getProperty('textContent');
console.log(await ip.jsonValue());
await browser.close();
};
scraper();
The above code switches your IP address randomly per request. The following is a sample result for three consecutive requests:
// request 1
{
"origin": "200.60.145.167"
}
// request 2
{
"origin": "160.86.242.23"
}
// request 3
{
"origin": "188.166.229.121"
}
Way to go! Your Puppeteer rotating proxy script is now ready.
However, the limitation of this approach is that you've rotated free proxies, which are only suitable for testing rather than real-life applications. Additionally, coding the rotation logic yourself is time-consuming and less accurate. The proxy list also becomes challenging to manage at scale, increasing the risk of IP bans.Â
Fortunately, there is a more efficient alternative. Let's dive into it!
How to Choose the Best Proxies
Free proxies are generally shared. So, they have a short lifespan, low success rates, and poor IP reputations. Most websites you'll encounter will block them easily, limiting your scraping activities.Â
The most reliable solution is to use paid or premium proxies, which are, fortunately, inexpensive.Â
ZenRows is a top premium proxy provider with a vast proxy pool of 55M+ residential IPs across 185+ countries. It offers advanced functionalities, including proxy rotation to efficiently distribute your traffic across several locations and a flexible geo-location feature to access geo-restricted content at scale.
Let's see how it works!Â
Sign up for free to open the ZenRows Request Builder. Go to Residential Proxies to open the Proxy Generator. Then, copy your proxy credentials (username and password).
Now, integrate ZenRows proxy into your Puppeteer script like so:
// npm install puppeteer
const puppeteer = require('puppeteer');
const scraper = async () => {
// authenticated proxy server info
const proxyURL = 'http://superproxy.zenrows.com:1337';
const proxyUsername = '<ZENROWS_PROXY_USERNAME>';
const proxyPassword = '<ZENROWS_PROXY_PASSWORD>';
// launch a browser instance with the
// --proxy-server flag enabled
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxyURL}`],
});
// open a new page in the current browser context
const page = await browser.newPage();
// specify the proxy credentials before
// visiting the page
await page.authenticate({
username: proxyUsername,
password: proxyPassword,
});
// visit the target page
await page.goto('https://httpbin.org/ip');
// extract the IP the request comes from
// and print it
const body = await page.waitForSelector('body');
const ip = await body.getProperty('textContent');
console.log(await ip.jsonValue());
await browser.close();
};
scraper();
Here's an example of what the output looks like:
{
"origin": "113.61.63.15"
}
Incredible! You now have a scraping proxy solution with Puppeteer's capabilities, and it's even more effective!
Troubleshooting Puppeteer Proxy Server Issues
While setting up a Puppeteer proxy, you might encounter errors due to misconfigurations, connectivity issues, an unreachable server, or an incorrect proxy address. If your proxy connection keeps failing, here's how to troubleshoot it.
-
Validate Proxy Setup: The first troubleshooting step is to run your Puppeteer scraper without a proxy configuration. If Puppeteer runs successfully without an error, your proxy server setup may be faulty. Verify your proxy address, and ensure the proxy option is passed correctly into the
puppeteer.launch
function. - Verify Authentication Credentials: An HTTP error 407 while using an authenticated proxy specifically signals authentication issues. To resolve this issue, ensure you correctly input proxy credentials, such as username and password.
-
Confirm Proxy Accessibility: Use tools like
cURL
ortelnet
to test the proxy connection and confirm whether the server is available and accessible. -
Enable Detailed Debugging: Use verbose logging or run Puppeteer with
devtools: true
to capture detailed connection logs from the browser instance console.
Conclusion
This step-by-step tutorial explained how to configure a proxy in Puppeteer, from basic setup to proxy rotation, authentication, and troubleshooting common issues. Rotating proxies increases your chances of bypassing blocks by distributing traffic across several IPs.Â
Remember that free proxies are unreliable. It's best to opt for premium proxies, such as the ZenRows residential proxies, which offer autorotation and geo-location features out of the box.
Try ZenRows for free today without a credit card!