Proxies can be helpful for various reasons, including bypassing IP restrictions and avoiding detection by anti-bot systems. In this tutorial, you'll learn how to set up a PowerShell proxy using Invoke WebRequest, a cmdlet for making requests in PowerShell.
Using Invoke WebRequest with a proxy enables you to route your requests through an intermediary server, disguising your web activity.
Read on as we explore everything you need to know about configuring proxies when web scraping with PowerShell.
Step 1: Set up a Proxy With PowerShell and Invoke-WebRequest
The Invoke-WebRequest cmdlet in PowerShell is a powerful tool that allows you to send HTTP and HTTPS requests to a web page or API. It automatically parses the response and returns a structured object with various properties for extracting parts of the HTML response.
Here's a basic example of a GET request using Invoke-WebRequest.
# make GET request
$response = Invoke-WebRequest -URI https://httpbin.io/ip
# print response content
$response.Content
This command makes a request to a demo website that returns the client's IP address.
{
"origin": "108.225.67.62:33907"
}
If you're new to web scraping in PowerShell or need a refresher, check out this guide to PowerShell web scraping.
Now, let's move to a step-by-step guide on how to configure a PowerShell proxy.
Invoke-WebRequest offers the -Proxy
flag for specifying proxies in your request. Here's how to use this flag in PowerShell:
# make GET request through a proxy
$response = Invoke-WebRequest -URI https://httpbin.io/ip -Proxy http://47.252.29.28:11222
# print response content
$response.Content
This command routes your request through the specified proxy, which returns the proxy's IP address.
{
"origin": "47.252.29.28:10683"
}
Well done!
We used a proxy from the Free Proxy List website. However, it may not work at the time of reading as free proxies are generally short-lived and only suitable for learning purposes. In case of an error, just grab a new one from the site.
Proxy Authentication
Some proxies require additional information, such as a username and password, to control access to their proxy servers. In PowerShell, a combination of the -ProxyCredential
flag and the Get-Credential
cmdlet allows you to include the desired data in your request.
Get-Credential
prompts you to enter the proxy username and password and creates a credentials object, while the -ProxyCredential
flag allows you to pass this object to Invoke-WebRequest.
First, create a prompt for your proxy credentials using Get-Credentials
.
# create a prompt for your credentials
$cred = Get-Credential -Message 'Please enter your username and password for the proxy server.'
Then, pass the credentials as parameters in your request, using -ProxyCredential
, like in the code snippet below.
# pass proxy credentials as parameters in your request
if ($cred) {
$response = Invoke-WebRequest -URI https://httpbin.io/ip -Proxy http://47.252.29.28:11222 -ProxyCredential $cred
}
Lastly, put everything together and verify it works using a print statement.
# create a prompt for your credentials
$cred = Get-Credential -Message 'Please enter your username and password for the proxy server.'
# pass proxy credentials as parameters in your request
if ($cred) {
$response = Invoke-WebRequest -URI https://httpbin.io/ip -Proxy http://47.252.29.28:11222 -ProxyCredential $cred
}
# print response content
$response.Content
You'll be prompted to enter your username and password, and the result will be your proxy server's IP address.
Best Proxy Protocol: HTTP, HTTPS, SOCKS
The best proxy protocol depends on your project requirements and use cases. HTTP and HTTPS protocols are ideal for web scraping tasks as they're specific to HTTP and HTTPS traffic.
However, HTTPS proxies are recommended due to the additional layer of security they provide by encrypting traffic between the client and proxy server.
On the other hand, the SOCKS proxy protocol is more versatile and handles various protocols, including FTP, IMAP, SMTP, etc.
Although PowerShell supports all proxy protocols and the command structure remains the same, HTTPS and SOCKS proxies require PowerShell 7 and above.
Step 2: Use Rotating Proxies With PowerShell
While configuring a proxy allows you to route your requests through the specified proxy server, relying on a single server can quickly become problematic as websites often block IP addresses that make multiple requests in a short period.
To avoid these restrictions, you must rotate proxies. This involves switching between IP addresses periodically or per request to mimic organic traffic.
One way to achieve this is by creating a proxy pool and randomly selecting one for each request. Let's see how to do it in PowerShell.
Define a proxy list. To follow along in this example, grab yours from Free Proxy List.
# define a list of proxy servers
$proxies = @(
"http://189.240.60.168:9090",
"http://197.255.126.69:80",
"http://162.223.90.130:80"
)
Next, select one proxy randomly from the list. You can create a function to keep your code clean and readable.
# function to randomly select a proxy from the list
function Get-RandomProxy {
return $proxies | Get-Random
}
After that, make your request using the selected proxy.
# pass the selected proxy as a parameter in your request.
$response = Invoke-WebRequest -URI https://httpbin.io/ip -Proxy $proxy
Put everything together and verify your code works using a print statement.
# define a list of proxy servers
$proxies = @(
"http://189.240.60.168:9090",
"http://197.255.126.69:80",
"http://162.223.90.130:80"
)
# function to randomly select a proxy from the list
function Get-RandomProxy {
return $proxies | Get-Random
}
$proxy = Get-RandomProxy
# pass the selected proxy as a parameter in your request.
$response = Invoke-WebRequest -URI https://httpbin.io/ip -Proxy $proxy
Write-Output "Using proxy: $proxy"
# print response content
$response.Content
You'll get a different IP address each time you run this script. Here are the results for two requests.
Using proxy: http://189.240.60.16:80
{
"origin": "189.240.60.168:10683"
}
Using proxy: http://162.223.90.130:80
{
"origin": "162.223.90.130:11401"
}
Congratulations, you've just built a proxy rotator!
However, keep in mind the example above targeted a demo website, and may not work just as well in production. This is because free proxies are easily detected and blocked by anti-bot systems. They can be unreliable, slow, or may not work with more sophisticated websites. Let's see what works next.
Use Premium Proxies
While free proxies are handy in tutorials and may work sometimes, they're too unreliable for large-scale web scraping.
In production, you should use residential premium proxies. Residential proxies are IP addresses associated with an actual user's device, which makes you appear like a natural user when you route your requests through them.
ZenRows, one of the best premium proxy providers, automatically rotates these residential IPs to increase the effectiveness of your scraper. This allows you to scrape any website without getting blocked.
Let's see a step-by-step guide on integrating ZenRows premium proxy in your scraper.
To use ZenRows, sign up to access your dashboard. Select Residential Proxies in the left menu section and create a new proxy user. You'll be directed to the Proxy Generator page.
Copy your proxy username, password, domain, and port.
Here's the final code using ZenRows premium proxies:
# create a prompt for your credentials
$cred = Get-Credential -Message 'Please enter your username and password for the proxy server.'
# pass proxy credentials as parameters in your request
if ($cred) {
$response = Invoke-WebRequest -URI https://httpbin.io/ip -Proxy http://superproxy.zenrows.com:1337 -ProxyCredential $cred
}
# print response content
$response.Content
ZenRows' premium proxies requires authentication, so after running this code, you'll be prompted to enter the proxy username and password.
You'll get a similar response as output:
{
"origin": "200.88.78.215:32922"
}
This response indicates that your query was effectively channeled through ZenRows' premium residential proxy. The displayed IP address will differ from your original IP, verifying that the proxy mechanism is functioning as intended.
What's the Alternative to Invoke-WebRequest?
Apart from Invoke-webRequest, PowerShell offers the Invoke-RestMethod for making HTTP and HTTPS requests to REST (Representational State Transfer) web services. These services often return rich structured data, which PowerShell formats according to the data type.
For example, in JSON or XML responses, PowerShell converts its content into an in-memory object known as [PSCustomObjects]
objects.
Below is an example of retrieving JSON data from an API endpoint using Invoke-RestMethod.
# make the HTTP GET request
$response = Invoke-RestMethod -Uri https://httpbin.io/ip
# output the response
$response
Here, Invoke-WebRequest retrieves the JSON data and automatically deserializes it into a PowerShell object, which returns the following formatted result.
origin
------
108.015.57.152:6351
In essence, both cmdlets can interact with web services, but they differ in how they handle responses. Invoke-RestMethod's ability to interact with services that return structured data, such as JSON, XML, and RSS or ATOM feeds, makes it suitable for web scraping tasks that involve APIs or endpoints.
Conclusion
Configuring premium proxies is critical to avoid web scraping challenges such as IP bans, rate limiting, and detection by anti-bot systems.
While a single proxy can get you started, websites often ban IP addresses that make multiple requests in a short period of time. For uninterrupted web scraping, you should use high-quality rotating proxies.
To save yourself the hassle of creating a proxy rotator from scratch, consider ZenRows. Apart from auto-rotating premium proxies, ZenRows provides a whole web scraping toolkit that enables you to scrape without getting blocked. Try it for free now!