You've probably had your IP address blocked while trying to scrape information from a website. Annoying, right?
Luckily, you can hide your IP using a proxy and access the website as if you were someone. This can help you handle anti-scraping measures. And Guzzle, the popular PHP HTTP client, makes that easy.Â
So let's dive in and learn how to use a proxy with Guzzle.
What Is a Guzzle Proxy
When using Guzzle, a proxy is a server that stands between your client and the destination server. It sends your request to the target server and gives your client the response.
Also, a proxy can bypass IP bans that prevent web scraping or restrict website access, cache responses and limit destination server requests.
Let's see what you need to use a Guzzle proxy properly.
Prerequisites
Before moving on, ensure you have PHP >= v7.2.5 and Composer installed. Also, following this tutorial will be easier if you understand the basics of web scraping in PHP.
Create a demo directory and install Guzzle in it:
composer require guzzlehttp/guzzle
Then, create a demo PHP file in the just created directory and require Composer's autoloader:
<?php
# composer's autoloader
require 'vendor/autoload.php';
Now, let's set the proxy.
How to Use a Proxy with Guzzle
In this section, you'll learn to send a request with Guzzle using a proxy and to authenticate the proxy. You'll need some proxies to begin with, so get some from a free proxy list.
Ensure you pick valid proxies with decent uptime and that the proxy URL is in the below format:
<PROXY_PROTOCOL>://<PROXY_IP_ADDRESS>:<PROXY_PORT>
Important to learn now is that you can use a proxy with Guzzle either with request-options
or middleware.
Overall, if you need a simple and static proxy configuration, using request-options
is a good choice. Meanwhile, using middleware provides more flexibility and control over proxy behavior, but it requires more setup.
Let's explore these two methods.
Method A: Set a Guzzle Proxy with request-options
To set a proxy with request-options
, start by importing Guzzle's Client
and RequestOptions
classes:
use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;
Then, define your target URL and an associative array of all the proxies you'll use:
# make request to
$targetUrl = 'https://httpbin.org/ip';
# proxies
$proxies = [
'http' => 'http://190.43.92.130:999',
'https' => 'http://5.78.76.237:8080',
];
The target URL defined above is HTTPBin, which returns the IP address of any client that makes a GET
request to it. Guzzle will handle HTTP and HTTPS traffic via the HTTP and HTTPS proxies defined above.
Free proxies are only active for a limited time. Those listed above probably won't work for you, so replace them with new ones. Aside from HTTP and HTTPS, Guzzle supports other proxy protocols, including SOCKS5.
Now, let's create a Guzzle client and pass the proxies we defined as the value for the Guzzle proxy
option:
$client = new Client([
RequestOptions::PROXY => $proxies,
RequestOptions::VERIFY => false, # disable SSL certificate validation
RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
]);
Since proxy servers don't work well during SSL verification, the verify
option here disables it. The timeout
option limits the timeout of each request to thirty seconds.
Then, let's make the request and print the response:
try {
$body = $client->get($targetUrl)->getBody();
echo $body->getContents();
} catch (\Exception $e) {
echo $e->getMessage();
}
At this point, your PHP script should look like this:
<?php
# composer's autoloader
require 'vendor/autoload.php';
use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;
# make request to
$targetUrl = 'https://httpbin.org/ip';
# proxies
$proxies = [
'http' => 'http://190.43.92.130:999',
'https' => 'http://5.78.76.237:8080',
];
$client = new Client([
RequestOptions::PROXY => $proxies,
RequestOptions::VERIFY => false, # disable SSL certificate validation
RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
]);
try {
$body = $client->get($targetUrl)->getBody();
echo $body->getContents();
} catch (\Exception $e) {
echo $e->getMessage();
}
Run your script using php <filename>.php
, and you'll get an output similar to the one below:
{
"origin": "5.78.76.237"
}
Great! The value of the origin
key is the IP address of the client that made the request to HTTPBin. In this case, that should be the proxies you defined.
If you get an error instead, pick new proxies from the list and try again.
Method B: Use Middleware
Using middleware to set a Guzzle HTTP proxy is similar to the first method. The only difference is that we'll create and add proxy middleware to the default handler stack.
First, update your import like this:
# ...
use Psr\Http\Message\RequestInterface;
use GuzzleHttp\HandlerStack;
# ...
Next, create a proxy middleware by adding the below code right after your $proxies
array to intercept each request and set the proxies.
function proxy_middleware(array $proxies)
{
return function (callable $handler) use ($proxies) {
return function (RequestInterface $request, array $options) use ($handler, $proxies) {
# add proxy to request option
$options[RequestOptions::PROXY] = $proxies;
return $handler($request, $options);
};
};
}
We can now add the middleware to the default handler stack and update our Guzzle client with the stack:
$stack = HandlerStack::create();
$stack->push(proxy_middleware($proxies));
$client = new Client([
'handler' => $stack,
RequestOptions::VERIFY => false, # disable SSL certificate validation
RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
]);
Your PHP script should look like this:
<?php
# composer's autoloader
require 'vendor/autoload.php';
use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;
use Psr\Http\Message\RequestInterface;
use GuzzleHttp\HandlerStack;
# make request to
$targetUrl = 'https://httpbin.org/ip';
# proxies
$proxies = [
'http' => 'http://<YOUR_USERNAME>:<YOUR_PASSWORD>@190.43.92.130:999',
'https' => 'http://<YOUR_USERNAME>:<YOUR_PASSWORD>@5.78.76.237:8080',
];
function proxy_middleware(array $proxies)
{
return function (callable $handler) use ($proxies) {
return function (RequestInterface $request, array $options) use ($handler, $proxies) {
# add proxy to request option
$options[RequestOptions::PROXY] = $proxies;
return $handler($request, $options);
};
};
}
$stack = HandlerStack::create();
$stack->push(proxy_middleware($proxies));
$client = new Client([
'handler' => $stack,
RequestOptions::VERIFY => false, # disable SSL certificate validation
RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
]);
try {
$body = $client->get($targetUrl)->getBody();
echo $body->getContents();
} catch (\Exception $e) {
echo $e->getMessage();
}
Run the PHP script again, and you'll get similar results as the other method.
Proxy Authentication with Guzzle
Some proxy servers require client authentication before granting access, which is common when using premium proxies or commercial solutions. If that's your case, add the options for authentication, usually a username and password, to the proxy string.
The syntax of the Guzzle proxy string will then look like this:
<PROXY_PROTOCOL>://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>
Here's an example:
# ...
# proxies
$proxies = [
'http' => 'http://<YOUR_USERNAME>:<YOUR_PASSWORD>@190.43.92.130:999',
'https' => 'http://<YOUR_USERNAME>:<YOUR_PASSWORD>@5.78.76.237:8080',
];
# ...
An HTTP error with status code 407 (Proxy Authentication Required) will be returned if valid credentials aren't provided, so always ensure you pass the correct ones.
Use a Rotating Proxy with Guzzle
A rotating proxy is a proxy server that regularly switches between different IP addresses. It can help prevent IP blocking, as each request is sent from a different IP, making it harder for websites to identify bots coming from the same source.
Let's implement a rotating proxy with Guzzle, with a free solution first and then with a professional one.
Rotate IPs with a Free Solution
We'll start with a scraper that uses a Guzzle proxy to request and retries for a set maximum number of attempts until it succeeds using a list of free proxies.
First, write a function that returns random proxies:
function get_random_proxies(): array {
$http_proxies = array(
'http://190.43.92.130:999',
'http://201.182.251.142:999',
# ...
'http://200.123.15.250:999'
);
$https_proxies = array(
'http://5.78.76.237:8080',
'http://8.218.239.205:8888',
# ...
'http://169.55.89.6:80'
);
$http_proxy = $http_proxies[array_rand($http_proxies)];
$https_proxy = $https_proxies[array_rand($https_proxies)];
# proxies
$proxies = [
'http' => $http_proxy,
'https' => $https_proxy,
];
return $proxies;
}
Get new proxies from the Free Proxy List or our list of best proxies and update the $http_proxies
and $https_proxies
, respectively.
Now, add the intended function and call it:
function rotating_proxy_request(string $http_method, string $targetUrl, int $max_attempts = 3): string
{
$response = null;
$attempts = 1;
while ($attempts <= $max_attempts) {
$proxies = get_random_proxies();
echo "Using proxy: ".json_encode($proxies).PHP_EOL;
$client = new Client([
RequestOptions::PROXY => $proxies,
RequestOptions::VERIFY => false, # disable SSL certificate validation
RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
]);
try {
$body = $client->request(strtoupper($http_method), $targetUrl)->getBody();
$response = $body->getContents();
break;
} catch (\Exception $e) {
echo $e->getMessage().PHP_EOL;
echo "Attempt ".$attempts." failed!".PHP_EOL;
if ($attempts < $max_attempts) {
echo "Retrying with a new proxy".PHP_EOL;
}
$attempts += 1;
}
}
return $response;
}
$response = rotating_proxy_request('get', 'https://httpbin.org/ip');
// $response = rotating_proxy_request('get', 'https://www.g2.com/products/zenrows/reviews'); # 403
echo $response;
Here's the full PHP script:
<?php
# composer's autoloader
require 'vendor/autoload.php';
use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;
function get_random_proxies(): array {
$http_proxies = array(
'http://190.43.92.130:999',
'http://201.182.251.142:999',
# ...
'http://200.123.15.250:999'
);
$https_proxies = array(
'http://5.78.76.237:8080',
'http://8.218.239.205:8888',
# ...
'http://169.55.89.6:80'
);
$http_proxy = $http_proxies[array_rand($http_proxies)];
$https_proxy = $https_proxies[array_rand($https_proxies)];
# proxies
$proxies = [
'http' => $http_proxy,
'https' => $https_proxy,
];
return $proxies;
}
function rotating_proxy_request(string $http_method, string $targetUrl, int $max_attempts = 3): string
{
$response = null;
$attempts = 1;
while ($attempts <= $max_attempts) {
$proxies = get_random_proxies();
echo "Using proxy: ".json_encode($proxies).PHP_EOL;
$client = new Client([
RequestOptions::PROXY => $proxies,
RequestOptions::VERIFY => false, # disable SSL certificate validation
RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
]);
try {
$body = $client->request(strtoupper($http_method), $targetUrl)->getBody();
$response = $body->getContents();
break;
} catch (\Exception $e) {
echo $e->getMessage().PHP_EOL;
echo "Attempt ".$attempts." failed!".PHP_EOL;
if ($attempts < $max_attempts) {
echo "Retrying with a new proxy".PHP_EOL;
}
$attempts += 1;
}
}
return $response;
}
$response = rotating_proxy_request('get', 'https://httpbin.org/ip');
// $response = rotating_proxy_request('get', 'https://www.g2.com/products/zenrows/reviews'); # 403
echo $response;
If you run the script, you'd get similar results as in the previous section, except the request will be retried up to three times if it doesn't succeed.
Yet, free proxies aren't reliable and are likely to fail. Let's test it out by making a request to G2.com
.Â
Update the script with the code below:
# ...
$response = rotating_proxy_request('get', 'https://www.g2.com/products/zenrows/reviews');
echo $response;
Run it, and you'll get something like this:
We got an error with a status code of 403 (Forbidden) because the rotating proxy was probably identified as a bot.
A better solution is to use a premium proxy. Let's see that next.
Premium Proxy to Avoid Getting Blocked
Using premium proxies is the best way to avoid getting blocked. They used to be expensive, but solutions like ZenRows have changed that, with plans starting at just $69 per month. It also offers geolocation, and you only pay for successful requests.
Sign up for ZenRows to get 1,000 free URLs.
Once you have your account, copy your ZenRows API key from the proxy URL. Also, activate the premium proxy rotator, plus the JS rendering to be better prepared.
Update your PHP script like this:
<?php
# composer's autoloader
require 'vendor/autoload.php';
use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;
# make request to
$targetUrl = 'https://www.g2.com/products/zenrows/reviews';
$proxy = 'http://<ZENROWS_PROXY_USERNAME>:<ZENROWS_PROXY_PASSWORD>@superproxy.zenrows.com:1337';
# proxies
$proxies = [
'http' => $proxy,
'https' => $proxy,
];
$client = new Client([
RequestOptions::PROXY => $proxies,
RequestOptions::VERIFY => false, # disable SSL certificate validation
]);
try {
$body = $client->get($targetUrl)->getBody();
echo $body->getContents();
} catch (\Exception $e) {
echo $e->getMessage();
}
Replace <YOUR_ZENROWS_API_KEY>
with the ZenRows API key you copied earlier.
Run it, and you'll see the output:
Congrats! Your premium Guzzle proxy script is ready for use!
ZenRows gives you all the tools you need, like premium proxies and JS rendering. You can explore all the options provided on your ZenRows dashboard.
Conclusion
This tutorial shows the steps you need to use proxies with Guzzle. Now you know:
- The basics of using a proxy with Guzzle.Â
- How to implement a rotating proxy.
- Why premium proxies are better and how to use them.
As free proxies are unreliable and should be used for testing only, consider using a premium provider. ZenRows has a reliable rotating proxy system via API calls and comes with other advanced anti-bot bypass features to ensure the success of your scraper.