CloudFail to Scrape Cloudflare Sites in 2024

June 8, 2023 · 4 min read

Many modern websites rely on Cloudflare to enhance security, optimize performance, and protect against malicious attacks. But this advanced anti-bot system denies automated traffic, which includes scraping activities, so tools like CloudFail can grant you passage through the backdoor: the target website's origin server.

In this tutorial, you'll learn how to bypass Cloudflare using CloudFail. You'll also explore a more efficient and scalable alternative.

Ready? Let's dive in.

What Is CloudFail

CloudFail is an open-source exploratory tool that reveals the origin servers' IP addresses. That way, web scrapers have a new opportunity to extract data from Cloudflare-protected websites.

How CloudFail Works

CloudFail takes advantage of website misconfigurations that expose the origin server. The tool goes through three different phases while using Tor to mask all requests:

1. Misconfigured DNS scan using DNSDumpster.com: When you run your target website through CloudFail, it queries DNSDumpster.com. That service takes a domain and produces its DNS information. The result is then analyzed to identify any misconfiguration in the DNS setup that can provide information about the origin server.

2. Scan the CrimeFlare database: CrimeFlare is a historical database of Cloudflare-protected websites and their related data. In this phase, CloudFail queries it for data associated with the target domain. That may include subdomains that aren't behind Cloudflare protection. Sometimes, CloudFail may pick up the actual IP address.

3. Bruteforce scan over 2,500 subdomains: CloudFail initiates a brute force scan during the third phase. That entails checking 2,500 subdomains against the target domain to identify misconfigurations.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

How to Use CloudFail

On GNU/Linux

To use CloudFail, install pip3 for Python3 dependencies using the following command:

Terminal
$ sudo apt-get install python3-pip

Then, create the directory you want to store the CloudFail source code and navigate to it.

Terminal
mkdir CloudFail_Scraper

cd CloudFail_scraper

In the directory, clone CloudFail's repository using the below command:

Terminal
git clone https://github.com/m0rtem/CloudFail

Now that the source code is saved in your desired directory, navigate to CloudFail and run the following command to grant execute permission to the cloudfail.py file.

Terminal
chmod +x cloudfail.py

Lastly, install the dependencies and requirements to use Cloudfail.

Terminal
$ pip3 install -r requirements.txt

The sub-tools installed by the above command include Argparse, Colorama, Socket, Binascii, Datetime, Requests, win_inet_pton, and dnspython.

Everything's set up now, so let's investigate SEO.com using this command:

Terminal
python3 cloudfail.py --target seo.com

Here's the result:

CloudFail Result
Click to open the image in full screen

On Windows

It's the same process for Windows OS, but the execute permission isn't necessary. 

Download Cloudfail's source code and store it in an easy-to-locate directory. Then, navigate to the CloudFail within your directory and install the necessary dependencies, as we did on GNU/Linux.

The final step is using the command line to investigate SEO.com as we did for GNU/Linux.

Best CloudFail Alternative

Fortunately, ZenRows offers a viable, efficient, and scalable approach to bypassing Cloudflare and any anti-bot detection solution. You can retrieve the necessary data undetected by passing your target URL and making a single API call.

To try it for free, sign up, and you'll be displayed the Request Builder. Next, select Python (although it works with any language) and paste your target URL (we'll use seo.com). Then, check the boxes for Premium Proxies, and JS Rendering to set these parameters to true. With that, you'll get the code to run.

ZenRows Request Builder
Click to open the image in full screen

Now, install the Python Requests library using the following command (or use any other HTTP library):

Terminal
pip install requests

Copy the code from ZenRows and run the code in your favorite editor. Here's the final code:

scraper.py
# import requests
import requests
 
url = 'https://seo.com/'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
    'url': url,
    'apikey': apikey,
    'js_render': 'true',
    'premium_proxy': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)

It'll print the page's HTML. Congratulations on bypassing your first Cloudflare-protected website.

Conclusion

While boycotting Cloudflare and sending requests directly to your target website's origin server was possible with CloudFail, the tool no longer works. Besides, this approach is unreliable as not all websites' real IP addresses can be found.

However, ZenRows works and is a much more efficient and easily scalable alternative. With a single API call, you can bypass any anti-bot measures. Sign up now to get your 1,000 free API credits.

Ready to get started?

Up to 1,000 URLs for free are waiting for you