The most common challenge when web scraping in C# is getting blocked. Websites often implement various techniques to regulate bot traffic and deny web scraper access. However, the good news is you can overcome this issue by using a proxy with PuppeteerSharp.
Proxies act as intermediaries between you and the target website, enabling you to make requests from different devices and geographical regions. You'll learn how to get and configure a PuppeteerSharp proxy in this tutorial.
How to Set a Proxy with PuppeteerSharp
Here's the step-by-step process of setting up a proxy with PuppeteerSharp. You'll also learn how to rotate multiple proxies to increase your chances of avoiding detection.
Step 1: Get Started with PuppeteerSharp
Let's begin with a basic PuppeteerSharp scraper that makes an HTTP request to a target website.
The following script launches a headless browser, creates a new page, navigates to the target URL (httpbin, an API that returns the web client's IP address), retrieves the page content, and prints it to the console.
using PuppeteerSharp;
using System;
using System.Threading.Tasks;
class Program
{
static async Task Main(string[] args)
{
using var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();
await using var browser = await Puppeteer.LaunchAsync(
new LaunchOptions { Headless = true });
await using var page = await browser.NewPageAsync();
await page.GoToAsync("https://httpbin.io/ip");
// Get the content of the page
var pageContent = await page.GetContentAsync();
// Print the page content
Console.WriteLine(pageContent);
// Close the browser when done
await browser.CloseAsync();
}
}
Remark: The code above assumes you've created a console application project and installed PuppeteerSharp.
Run it, and the result of the request above should be your IP address.
{
"origin": "107.010.84.20"
}
Let's use a proxy next.
Step 2: Set a PuppeteerSharp Proxy
To follow along in this step, you need a proxy, and you can grab a free one from FreeProxyList. To be noted, we recommend using HTTPS proxies because they work for both HTTP and HTTPS requests.
To configure a PuppeteerSharp proxy, you must define your proxy details as a command line argument. For that, PuppeteerSharp provides the LauchAsync
method, which allows you to specify various options using the Args
property (an array of strings), including proxy settings.
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
Args = new[] { "--proxy-server=<PROXY_IP_ADDRESS>:<PROXY_PORT>" }
});
Now, replace <PROXY_IP_ADDRESS>:<PROXY_PORT>
with your proxy details (in our case, 8.219.97.248
, but you need to pick a fresh one) and add it to the basic request we created earlier, and you'll have the following complete code.
using PuppeteerSharp;
using System;
using System.Threading.Tasks;
class Program
{
static async Task Main(string[] args)
{
// Initialize a browser fetcher to download PuppeteerSharp binaries
using var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();
// Launch a headless browser instance with specified options
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true, // Run the browser in headless mode (no GUI)
Args = new[] { "--proxy-server=8.219.97.248:80" } // Configure the proxy server
});
// Create a new web page
await using var page = await browser.NewPageAsync();
// Navigate to target URL using the configured proxy (no proxy authentication in this code)
await page.GoToAsync("https://httpbin.io/ip");
// Retrieve the content of the web page
var pageContent = await page.GetContentAsync();
// Print the page content (in this case, the IP address)
Console.WriteLine(pageContent);
// Close the browser when finished
await browser.CloseAsync();
}
}
Run the script, and your response should be your proxy's IP address.
{
"origin": "8.219.974.248:52913"
}
Congrats! You've configured your first PuppeteerSharp proxy.
That said, it's worth noting that free proxies are unreliable, and real-world use cases mostly demand premium web scraping proxies, which often require additional configuration. Let's see how to implement such proxies in PuppeteerSharp.
Step 3: Do Proxy Authentication with PuppeteerSharp
Premium proxies often require you to provide valid credentials, such as username and password, to use its proxy service. This is necessary for security and access control on the part of the proxy providers.
To authenticate a proxy with PuppeteerSharp, you must provide the credentials to the AuthenticateAsync
method of the PuppeteerSharp.page
class.
page.AuthenticateAsync(new Credentials {Username = "<YOUR_USERNAME>", Password = "<YOUR_PASSWORD>"});
So, if the proxy in step 2 were premium, you'd authenticate it by modifying your code to include the credentials using the AuthenticateAsyc
method, like in the code below.
using PuppeteerSharp;
using System;
using System.Threading.Tasks;
class Program
{
static async Task Main(string[] args)
{
// Initialize a browser fetcher to download PuppeteerSharp binaries
using var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();
// Launch a headless browser instance with specified options
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true, // Run the browser in headless mode (no GUI)
Args = new[] { "--proxy-server=8.219.97.248:80" } // Configure the proxy server
});
// Create a new web page
await using var page = await browser.NewPageAsync();
// Authenticate with the proxy server using the proxy credentials
await page.AuthenticateAsync(new Credentials { Username = "<YOUR_USERNAME>", Password = "<YOUR_PASSWORD>" });
// Navigate to target URL using the configured proxy
await page.GoToAsync("https://httpbin.io/ip");
// Retrieve the content of the web page
var pageContent = await page.GetContentAsync();
// Print the page content (in this case, the IP address)
Console.WriteLine(pageContent);
// Close the browser when finished
await browser.CloseAsync();
}
}
Step 4: Build a Proxy Rotator
Too many requests from the same specific IP address are easily flagged as suspicious activity, and you can get blocked. However, you can avoid that by rotating through multiple proxies. This way, you distribute requests across different IP addresses, reducing the number of requests from a single IP address.
To build a proxy rotator in PuppeteerSharp, first, you need a proxy list, from which you'll randomly select one for each request. You can grab a few from FreeProxyList.
Start by defining your proxy list.
using PuppeteerSharp;
using System;
using System.Threading.Tasks;
class Program
{
static async Task Main(string[] args)
{
// Define a list of proxy addresses
var proxies = new List<string>
{
"http://34.140.70.242:8080",
"http://118.69.111.51:8080",
"http://15.204.161.192:18080",
"http://186.121.235.66:8080",
};
//..
}
Next, generate a random index and use it to select a proxy from your proxy list.
//..
// Generate a random index
var random = new Random();
int randomIndex = random.Next(proxies.Count);
// Select a random proxy using randomIndex
string randomProxy = proxies[randomIndex];
After that, create a new PuppeteerSharp browser instance, passing the randomly selected proxy as LaunchOptions
.
//..
// Launch browser instance with randomProxy
var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
Args = new[] { $"--proxy-server={randomProxy}" }
});
Lastly, update the basic request created earlier with the code blocks above, and you'll have the following complete code.
using PuppeteerSharp;
using System;
using System.Threading.Tasks;
class Program
{
static async Task Main(string[] args)
{
// Define a list of proxy addresses
var proxies = new List<string>
{
"http://34.140.70.242:8080",
"http://118.69.111.51:8080",
"http://15.204.161.192:18080",
"http://186.121.235.66:8080",
};
// Generate a random index
var random = new Random();
int randomIndex = random.Next(proxies.Count);
// Select a random proxy using the randomIndex
string randomProxy = proxies[randomIndex];
// Launch browser instance with randomProxy
var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
Args = new[] { $"--proxy-server={randomProxy}" }
});
// Create a new page
using var page = await browser.NewPageAsync();
// Navigate to target URL
await page.GoToAsync("https://httpbin.io/ip");
// Retreive page content
var pageContent = await page.GetContentAsync();
Console.WriteLine(pageContent);
// Close the browser
await browser.CloseAsync();
}
}
To verify it works, make multiple requests. You should get a different IP address for each. Here's the result for two requests:
{
"origin": "34.140.70.242"
}
//..
{
"origin": "186.121.235.66"
}
Awesome! You can now easily rotate proxies in PuppeteerSharp.
However, we only used free proxies to show you the basics. As mentioned before, they're unreliable and easily detected by websites. Keep reading for a better-performing solution.
Step 5: The Way to Rotate Proxies in a Real Scenario
In the previous step, we implied that free proxies can easily get blocked. Let's see how they fare in a real-world example (scraping an actual website). In this example, we'll try to scrape G2, a Cloudflare-protected website.
So, replace the target URL in step 4 with https://www.g2.com/
. Run your script, and you'll get an error message like the one below.
<!DOCTYPE html><html class="no-js" lang="en-US"><!--<![endif]--><head>
<title>Attention Required! | Cloudflare</title>
</head>
<body>
<div>
<h1 ...>Sorry, you have been blocked</h1>
<h2 ...><span data-translate="unable_to_access">You are unable to access</span> g2.com</h2>
</div>
//..
That proves that real-world scenarios require premium proxies. Two types of proxies are most used for scraping: Residential and datacenter. Residential proxies are most recommended because they use IP addresses associated with real residential devices, making it difficult for websites to detect them as bots.
To get started, you can check out our list of the best web scraping proxy providers.
That said, configuring premium proxies with Puppeteer can get tedious and difficult to scale. And proxies alone aren't enough for popular websites. Fortunately, you can make things easier by complementing PuppeteerSharp with ZenRows, a web scraping API that offers a residential proxy rotator, as well as all the features you need to avoid getting blocked, including User Agent rotation, anti-CAPTCHA, and more.
Let's see ZenRows in action scraping G2, a well-protected site. To get started, sign up for free, and you'll get to the Request Builder page.
Paste your target URL (https://www.g2.com/
), and check the box for Premium Proxies
to auto-rotate your IP address. Click on the JS Rendering
boost mode. Then, select C# as the language you'll use to get your request code generated on the right.
You'll see that RestSharp is suggested, but you can absolutely use PuppeteerSharp. You only need to send a request to the ZenRows API. For that, copy the ZenRows API URL from the generated request on the right and define it in your PuppeteerSharp code.
https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2F&js_render=true&premium_proxy=true
Then, make a request to the ZenRows API URL. Your code should look like this:
using PuppeteerSharp;
using System;
using System.Threading.Tasks;
class Program
{
static async Task Main(string[] args)
{
// Define the ZenRows API URL
string zenRowsApiUrl = "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2F&js_render=true&premium_proxy=true";
using var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true
});
var page = await browser.NewPageAsync();
// Send a request to the ZenRows API
await page.GoToAsync(zenRowsApiUrl);
// Retreive page content
var pageContent = await page.GetContentAsync();
Console.WriteLine(pageContent);
await browser.CloseAsync();
}
}
Run the code, and you'll get G2's HTML.
<!DOCTYPE html>
//..
<title id="icon-label-55be01c8a779375d16cd458302375f4b">G2 - Business Software Reviews</title>
//..
<h1 ...id="main">Where you go for software.</h1>
Easy, right? That's how it is to rotate proxies at scale with ZenRows.
However, most modern websites use sophisticated anti-bot techniques, and only rotating proxies and header rotation aren't enough. But you can replace PuppeteerSharp with ZenRows to avoid getting blocked.
ZenRows offers the same headless browser functionality as PuppeteerSharp but with the complete toolkit for bypassing any anti-bot system and less overhead. So, you even get to save machine costs as the headless browser is run by ZenRows.
To use ZenRows alone, you only need to copy the generated code on the right for C#.
Conclusion
Configuring a PuppeteerSharp proxy enables you to route your requests through a different IP address. However, making too many requests from a specific IP address can lead to an IP ban. So, you must rotate through multiple proxies for better results.
That being said, building a PuppeteerSharp proxy rotator can get really tedious and difficult to scale. Fortunately, ZenRows offers an easy way out, an intuitive API that handles everything under the hood, including rotating residential premium proxies. Sign up now to try it for free.