How to Bypass Cloudflare in PHP (2 Methods)

Ander Rodriguez
Ander Rodriguez
September 19, 2024 · 4 min read

Does Cloudflare keep blocking your PHP web scraper? 

Getting blocked by Cloudflare is common during web scraping. We've been there.

This article shows two tested ways to bypass Cloudflare while scraping with PHP. In each case, we'll bypass the Cloudflare protection on the Cloudflare Challenge page to test each solution's strength.

  1. Selenium Stealth.
  2. ZenRows Web Scraping API.

Can Cloudflare Detect PHP Scrapers?

The short answer is yes. Cloudflare can detect and block PHP scrapers. Let's see why:

Cloudflare is a content delivery network (CDN) and cloud security service that protects websites against security threats like DDoS attacks. However, it also targets activities like web scraping. As one of the top web application firewalls (WAFs), you'll likely encounter Cloudflare during scraping.

PHP web scrapers can't bypass Cloudflare security independently. Standard PHP HTTP clients, such as cURL, lack the default ability to avoid Cloudflare's anti-bot detection.

Standard HTTP clients present bot-like attributes, such as incomplete request headers, missing or suspicious User Agent, and inability to execute JavaScript and pass browser fingerprinting tests. 

All these limitations make PHP web scrapers vulnerable to Cloudflare's detection. For instance, a Cloudflare-protected page like the Cloudflare Challenge page will block PHP's cURL scraper.

Try it out with the following code:

Example
<?php
// initialize the cURL session
$ch = curl_init();

// set the URL for the HTTP request
curl_setopt($ch, CURLOPT_URL, 'https://www.scrapingcourse.com/cloudflare-challenge');

// specify the HTTP method
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

// return the response
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

// execute the cURL request and store the response
$response = curl_exec($ch);

// output the response
echo $response . PHP_EOL;

// close the cURL session
curl_close($ch);

The scraper got stuck in Cloudflare's interstitial page:

Output
<!DOCTYPE html>
<html lang="en-US">
<head>
    <title>Just a moment...</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2 class="h2" id="challenge-running">
        Checking if the site connection is secure
    </h2>
    <!-- ... -->
</body>
</html>

Cloudflare uses active and passive bot detection techniques to protect websites and keeps updating those security measures. These frequent updates make it increasingly challenging to bypass Cloudflare. Some of its bot detection mechanisms include:

  • TLS fingerprinting: One of Cloudflare's detection methods is to compare an incoming request's fingerprints with pre-collected ones to validate its authenticity. It then flags fingerprints with unusual parameters while allowing those matching trusted fingerprints. 
  • Request header analysis: Cloudflare also scans incoming request headers for bot-like parameters and blocks suspicious ones.
  • IP address reputation: Another Cloudflare detection method is to track and block a request based on its reputation, such as fraud history, geolocation, source, etc. For instance, residential proxies are more credible than those originating from data centers.
  • Behavior monitoring: Cloudflare tracks user interactions, such as click actions, mouse movements, scrolling, navigation, and more, to determine whether a request is from a bot.

These are only some of Cloudflare's common detection methods. Your scraper must bypass them all to get your desired data. Fortunately, there are ways to evade Cloudflare while scraping with PHP. Keep reading to learn them.

1. Selenium Stealth

Selenium Stealth is a patch for the standard Selenium WebDriver in PHP. Although not actively maintained, it sometimes bypasses low-level protections and is a good alternative to scraping with the standard PHP Selenium WebDriver

For instance, when you run the Selenium WebDriver in headless mode, it replaces the bot-like HeadlessChrome User Agent with an actual Chrome parameter, making the request appear more legitimate. However, like many other open-source solutions, Selenium Stealth still leaks some bot-like attributes and can't keep up with Cloudflare's security updates, making it prone to easily getting blocked. 

👍 Pros

  • Support for dynamic content scraping.
  • It patches the Selenium WebDriver in headless mode.
  • Screenshot capability.
  • Mimic human interactions.
  • Increases your chances of bypassing Cloudflare.

👎 Cons

  • It can't keep up with Cloudflare's frequent updates.
  • Browser instance increases memory overhead.
  • It's unsuitable for large-scale projects.
  • It leaks some detectable bot-like attributes.
  • Steep learning curve.

How to Bypass Cloudflare in PHP Using Selenium Stealth

Again, let's see how Selenium Stealth performs against the previous target page (the Cloudflare Challenge page).

First, ensure you install Selenium and Selenium Stealth using composer:

Terminal
composer require php-webdriver/webdriver
composer require sapistudio/seleniumstealth

Now, let's send a request to the target page using the following Selenium Stealth scraper:

Example
<?php
// composer require php-webdriver/webdriver
// composer require sapistudio/seleniumstealth

use Facebook\WebDriver\Remote\RemoteWebDriver;
use SapiStudio\SeleniumStealth\SeleniumStealth;
use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Chrome\ChromeOptions;
require_once 'vendor/autoload.php';

$capabilities = DesiredCapabilities::chrome();
// define the browser options
$chromeOptions = new ChromeOptions();

// define the server URL where the WebDriver is running
$serverUrl = 'http://localhost:4444';

// run Chrome in headless mode
$chromeOptions->addArguments(['--headless']);

// run Chrome in headless mode
$chromeOptions->addArguments(['--headless']);

// register the Chrome options
$capabilities->setCapability(ChromeOptions::CAPABILITY_W3C, $chromeOptions);
// add the browser capabilities to Chrome instance
$driver = RemoteWebDriver::create($serverUrl, $capabilities);

// add the stealth plugin to the WebDriver
$driver = (new SeleniumStealth($driver))->usePhpWebriverClient()->makeStealth();

// maximize the window to avoid responsive rendering
$driver->manage()->window()->maximize();

// navigate to the desired URL
$driver->get('https://www.scrapingcourse.com/cloudflare-challenge');

// get the page source and print it
$html = $driver->getPageSource();
echo $html;

// close the browser session
$driver->quit();
?>

Unfortunately, Selenium Stealth couldn't bypass the Cloudflare anti-bot on the challenge page.

Although Selenium Stealth patches Selenium's bot-like attributes, Cloudflare still blocks it. That's because Cloudflare's security keeps advancing, blocking the evasion strategies of open-source bypass tools. 

Even if Selenium Stealth manages to bypass easier Cloudflare protections, it will eventually get blocked after a few requests.

The best way out of the above block is to use a reliable solution like ZenRows. Let's jump into it below.

2. ZenRows Web Scraping API

ZenRows web scraping API is one of the top tools for bypassing Cloudflare while scraping with PHP or any other programming language. It helps you handle fingerprinting, proxy management, JavaScript rendering, and more under the hood. With ZenRows' AI-powered anti-bot and CAPTCHA auto-bypass feature, you'll evade any block while scraping.

ZenRows is also lightweight, allowing you to gain time and efficiency over your scraping tasks. You only need to make a single API call with your PHP scraper and watch ZenRows handle the hard tasks behind the scenes. If you need to automate web actions, ZenRows provides a headless browsing feature, allowing you to execute user interactions.

👍 Pros

  • Bypass Cloudflare and all other anti-bots.
  • Highly scalable.
  • Compatible with any programming language.
  • Premium residential proxies and anti-CAPTCHA.
  • Easy to use.
  • It only requires a single API call.
  • Headless browsing feature to scrape dynamic content and mimic human interactions.
  • It's the option to save time and memory bandwidth.
  • Support for screenshots.
  • Intuitive request dashboard to monitor requests and usage statistics.

👎 Cons

  • It's a paid solution but offers a free trial and only charges for successful requests.

How to Bypass Cloudflare in PHP Using ZenRows

To see how ZenRows works, we'll request the previous Cloudflare Challenge page that blocked our PHP scraper.

To start, sign up to open the ZenRows Request Builder.

Paste the target URL in the link box and activate Premium Proxies and JS Rendering. Select PHP as your programming language and choose the API connection mode. Copy and paste the generated code into your scraper file.

building a scraper with zenrows
Click to open the image in full screen

Here's the generated code:

Example
<?php
$ch = curl_init();
curl_setopt(
    $ch,
    CURLOPT_URL,  'https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.scrapingcourse.com%2Fcloudflare-challenge&js_render=true&premium_proxy=true'
);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
echo $response . PHP_EOL;
curl_close($ch);
?>
Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Cloudflare Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Cloudflare challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

That was easy 🎉! Without code manipulations or technicalities, you just scraped a Cloudflare-protected website using ZenRows.

Conclusion

You've learned how Cloudflare works and how to bypass it while scraping with PHP using a paid solution and an open-source tool. As mentioned earlier, Cloudflare's security measures are advanced, and it can easily detect open-source tools even if you fortify them with proxies and custom request headers.

The best way to bypass Cloudflare and any other anti-bot, regardless of its complexity, is to use ZenRows. All it takes is a single API call, and ZenRows handles all the bypass tasks behind the scenes.

Try ZenRows for free without a credit card!

Ready to get started?

Up to 1,000 URLs for free are waiting for you