The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ๐Ÿ˜Ž

How to Use a Proxy with Guzzle in 2024

April 27, 2023 ยท 7 min read

You've probably had your IP address blocked while trying to scrape information from a website. Annoying, right?

Luckily, you can hide your IP using a proxy and access the website as if you were someone. And Guzzle, the popular PHP HTTP client, makes that easy.ย 

So let's dive in and learn how to use a proxy with Guzzle.

What Is a Guzzle Proxy

When using Guzzle, a proxy is a server that stands between your client and the destination server. It sends your request to the target server and gives your client the response.

Also, a proxy can bypass IP bans that prevent web scraping or restrict website access, cache responses and limit destination server requests.

Let's see what you need to use a Guzzle proxy properly.

Prerequisites

Before moving on, ensure you have PHP >= v7.2.5 and Composer installed. Also, following this tutorial will be easier if you understand the basics of web scraping in PHP.

Create a demo directory and install Guzzle in it:

Terminal
composer require guzzlehttp/guzzle

Then, create a demo PHP file in the just created directory and require Composer's autoloader:

program.php
<?php
# composer's autoloader
require 'vendor/autoload.php';

Now, let's set the proxy.

How to Use a Proxy with Guzzle

In this section, you'll learn to send a request with Guzzle using a proxy and to authenticate the proxy. You'll need some proxies to begin with, so get some from a free proxy list.

Ensure you pick valid proxies with decent uptime and that the proxy URL is in the below format: <PROXY_PROTOCOL>://<PROXY_IP_ADDRESS>:<PROXY_PORT>

Important to learn now is that you can use a proxy with Guzzle either with request-options or middleware.

Overall, if you need a simple and static proxy configuration, using request-options is a good choice. Meanwhile, using middleware provides more flexibility and control over proxy behavior, but it requires more setup.

Let's explore these two methods.

Method A: Set a Guzzle Proxy with request-options

To set a proxy with request-options, start by importing Guzzle's Client and RequestOptions classes:

program.php
use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;

Then, define your target URL and an associative array of all the proxies you'll use:

program.php
# make request to
$targetUrl = 'https://httpbin.org/ip';

# proxies
$proxies = [
    'http'  => 'http://190.43.92.130:999',
    'https' => 'http://5.78.76.237:8080',
];

The target URL defined above is HTTPBin, which returns the IP address of any client that makes a GET request to it. Guzzle will handle HTTP and HTTPS traffic via the HTTP and HTTPS proxies defined above.

Now, let's create a Guzzle client and pass the proxies we defined as the value for the Guzzle proxy option:

program.php
$client = new Client([
    RequestOptions::PROXY => $proxies,
    RequestOptions::VERIFY => false, # disable SSL certificate validation
    RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
]);

Since proxy servers don't work well during SSL verification, the verify option here disables it. The timeout option limits the timeout of each request to thirty seconds.

Then, let's make the request and print the response:

program.php
try {
    $body = $client->get($targetUrl)->getBody();
    echo $body->getContents();
} catch (\Exception $e) {
    echo $e->getMessage();
}

At this point, your PHP script should look like this:

program.php
<?php
# composer's autoloader
require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;

# make request to
$targetUrl = 'https://httpbin.org/ip';

# proxies
$proxies = [
    'http'  => 'http://190.43.92.130:999',
    'https' => 'http://5.78.76.237:8080',
];

$client = new Client([
    RequestOptions::PROXY => $proxies,
    RequestOptions::VERIFY => false, # disable SSL certificate validation
    RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
]);

try {
    $body = $client->get($targetUrl)->getBody();
    echo $body->getContents();
} catch (\Exception $e) {
    echo $e->getMessage();
}

Run your script using php <filename>.php, and you'll get an output similar to the one below:

Output
{
  "origin": "5.78.76.237"
}

Great! The value of the origin key is the IP address of the client that made the request to HTTPBin. In this case, that should be the proxies you defined.

Method B: Use Middleware

Using middleware to set a Guzzle HTTP proxy is similar to the first method. The only difference is that we'll create and add proxy middleware to the default handler stack.

First, update your import like this:

program.php
# ...
use Psr\Http\Message\RequestInterface;
use GuzzleHttp\HandlerStack;
# ...

Next, create a proxy middleware by adding the below code right after your $proxies array to intercept each request and set the proxies.

program.php
function proxy_middleware(array $proxies) 
{
    return function (callable $handler) use ($proxies) {
        return function (RequestInterface $request, array $options) use ($handler, $proxies) {
            # add proxy to request option
            $options[RequestOptions::PROXY] = $proxies; 
            return $handler($request, $options);
        };
    };
}

We can now add the middleware to the default handler stack and update our Guzzle client with the stack:

program.php
$stack = HandlerStack::create();
$stack->push(proxy_middleware($proxies));

$client = new Client([
    'handler' => $stack,
    RequestOptions::VERIFY => false, # disable SSL certificate validation
    RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
]);

Your PHP script should look like this:

program.php
<?php
# composer's autoloader
require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;
use Psr\Http\Message\RequestInterface;
use GuzzleHttp\HandlerStack;

# make request to
$targetUrl = 'https://httpbin.org/ip';

# proxies
$proxies = [
    'http'  => 'http://username:[email protected]:999',
    'https' => 'http://username:[email protected]:8080',
];

function proxy_middleware(array $proxies) 
{
    return function (callable $handler) use ($proxies) {
        return function (RequestInterface $request, array $options) use ($handler, $proxies) {
            # add proxy to request option
            $options[RequestOptions::PROXY] = $proxies; 
            return $handler($request, $options);
        };
    };
}

$stack = HandlerStack::create();
$stack->push(proxy_middleware($proxies));

$client = new Client([
    'handler' => $stack,
    RequestOptions::VERIFY => false, # disable SSL certificate validation
    RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
]);

try {
    $body = $client->get($targetUrl)->getBody();
    echo $body->getContents();
} catch (\Exception $e) {
    echo $e->getMessage();
}

Run the PHP script again, and you'll get similar results as the other method.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Proxy Authentication with Guzzle

Some proxy servers require client authentication before granting access, which is common when using premium proxies or commercial solutions. If that's your case, add the options for authentication, usually a username and password, to the proxy string.
The syntax of the Guzzle proxy string will then look like this:

program.php
<PROXY_PROTOCOL>://<USERNAME>:<PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>

Here's an example:

Example
# ...

# proxies
$proxies = [
  'http'  => 'http://username:[email protected]:999',
  'https' => 'http://username:[email protected]:8080',
];

# ...

Use a Rotating Proxy with Guzzle

A rotating proxy is a proxy server that regularly switches between different IP addresses. It can help prevent IP blocking, as each request is sent from a different IP, making it harder for websites to identify bots coming from the same source.

Let's implement a rotating proxy with Guzzle, with a free solution first and then with a professional one.

Rotate IPs with a Free Solution

We'll start with a scraper that uses a Guzzle proxy to request and retries for a set maximum number of attempts until it succeeds using a list of free proxies.

First, write a function that returns random proxies:

program.php
function get_random_proxies(): array {
    $http_proxies = array(
        'http://190.43.92.130:999',
        'http://201.182.251.142:999',
        # ...
        'http://200.123.15.250:999'
    );
    
    $https_proxies = array(
        'http://5.78.76.237:8080',
        'http://8.218.239.205:8888',
        # ...
        'http://169.55.89.6:80'
    );
    $http_proxy = $http_proxies[array_rand($http_proxies)];
    $https_proxy = $https_proxies[array_rand($https_proxies)];
    # proxies
    $proxies = [
        'http'  => $http_proxy,
        'https' => $https_proxy,
    ];
    return $proxies;
}

Get new proxies from the Free Proxy List or our list of best proxies and update the $http_proxies and $https_proxies, respectively.

Now, add the intended function and call it:

program.php
function rotating_proxy_request(string $http_method, string $targetUrl, int $max_attempts = 3): string
{
    $response = null;
    $attempts = 1;

    while ($attempts <= $max_attempts) {
        $proxies = get_random_proxies();
        echo "Using proxy: ".json_encode($proxies).PHP_EOL;
        $client = new Client([
            RequestOptions::PROXY => $proxies,
            RequestOptions::VERIFY => false, # disable SSL certificate validation
            RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
        ]);
        try {
            $body = $client->request(strtoupper($http_method), $targetUrl)->getBody();
            $response = $body->getContents();
            break;
        } catch (\Exception $e) {
            echo $e->getMessage().PHP_EOL;
            echo "Attempt ".$attempts." failed!".PHP_EOL;
            if ($attempts < $max_attempts) {
                echo "Retrying with a new proxy".PHP_EOL;
            }
            $attempts += 1;
        }
    }
    return $response;
}

$response = rotating_proxy_request('get', 'https://httpbin.org/ip');
// $response = rotating_proxy_request('get', 'https://www.g2.com/products/zenrows/reviews'); # 403

echo $response;

Here's the full PHP script:

program.php
<?php
# composer's autoloader
require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;

function get_random_proxies(): array {
    $http_proxies = array(
        'http://190.43.92.130:999',
        'http://201.182.251.142:999',
        # ...
        'http://200.123.15.250:999'
    );
    
    $https_proxies = array(
        'http://5.78.76.237:8080',
        'http://8.218.239.205:8888',
        # ...
        'http://169.55.89.6:80'
    );
    $http_proxy = $http_proxies[array_rand($http_proxies)];
    $https_proxy = $https_proxies[array_rand($https_proxies)];
    # proxies
    $proxies = [
        'http'  => $http_proxy,
        'https' => $https_proxy,
    ];
    return $proxies;
}

function rotating_proxy_request(string $http_method, string $targetUrl, int $max_attempts = 3): string
{
    $response = null;
    $attempts = 1;

    while ($attempts <= $max_attempts) {
        $proxies = get_random_proxies();
        echo "Using proxy: ".json_encode($proxies).PHP_EOL;
        $client = new Client([
            RequestOptions::PROXY => $proxies,
            RequestOptions::VERIFY => false, # disable SSL certificate validation
            RequestOptions::TIMEOUT => 30, # timeout of 30 seconds
        ]);
        try {
            $body = $client->request(strtoupper($http_method), $targetUrl)->getBody();
            $response = $body->getContents();
            break;
        } catch (\Exception $e) {
            echo $e->getMessage().PHP_EOL;
            echo "Attempt ".$attempts." failed!".PHP_EOL;
            if ($attempts < $max_attempts) {
                echo "Retrying with a new proxy".PHP_EOL;
            }
            $attempts += 1;
        }
    }
    return $response;
}

$response = rotating_proxy_request('get', 'https://httpbin.org/ip');
// $response = rotating_proxy_request('get', 'https://www.g2.com/products/zenrows/reviews'); # 403

echo $response;

If you run the script, you'd get similar results as in the previous section, except the request will be retried up to three times if it doesn't succeed.

Yet, free proxies aren't reliable and are likely to fail. Let's test it out by making a request to G2.com.ย 

Update the script with the code below:

program.php
# ...
$response = rotating_proxy_request('get', 'https://www.g2.com/products/zenrows/reviews');

echo $response;

Run it, and you'll get something like this:

Free Rotating Proxy's Error Response
Click to open the image in full screen

We got an error with a status code of 403 (Forbidden) because the rotating proxy was probably identified as a bot.

A better solution is to use a premium proxy. Let's see that next.

Premium Proxy to Avoid Getting Blocked

Using premium proxies is the best way to avoid getting blocked. They used to be expensive, but solutions like ZenRows have changed that, with plans starting at just $49 per month. It also offers geolocation, and you only pay for successful requests.

Sign up for ZenRows to get 1,000 free API credits.

Once you have your account, copy your ZenRows API key from the proxy URL. Also, activate the premium proxy rotator, plus the anti-bot and JavaScript rendering to be better prepared.

ZenRows Dashboard
Click to open the image in full screen

Update your PHP script like this:

program.php
<?php
# composer's autoloader
require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;

# make request to
$targetUrl = 'https://www.g2.com/products/zenrows/reviews';

$proxy = 'http://<YOUR_ZENROWS_API_KEY>:js_render=true&antibot=true&[email protected]:8001';

# proxies
$proxies = [
    'http'  => $proxy,
    'https' => $proxy,
];

$client = new Client([
    RequestOptions::PROXY => $proxies,
    RequestOptions::VERIFY => false, # disable SSL certificate validation
]);

try {
    $body = $client->get($targetUrl)->getBody();
    echo $body->getContents();
} catch (\Exception $e) {
    echo $e->getMessage();
}

Replace <YOUR_ZENROWS_API_KEY> with the ZenRows API key you copied earlier. Run it, and you'll see the output:

Output
Click to open the image in full screen

Congrats! Your premium Guzzle proxy script is ready for use!

ZenRows gives you all the tools you need, like anti-bot and JS rendering. You can explore all the options provided on your ZenRows dashboard.

Conclusion

This tutorial shows the steps you need to use proxies with Guzzle. Now you know:

  • The basics of using a proxy with Guzzle.ย 
  • How to implement a rotating proxy.
  • Why premium proxies are better and how to use them.

As free proxies are unreliable and should be used for testing only, consider using a premium provider. ZenRows has a reliable rotating proxy system via API calls and comes with other advanced anti-bot bypass features to ensure the success of your scraper.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.