Many websites use Cloudflare to detect and block bots, including web scrapers. However, you can use CloudUnflare to bypass Cloudflare and get the data you want.
Let's see how!
What Is CloudUnflare
CloudUnflare is an open-source reconnaissance tool for uncovering the IP address of a target domain's origin server that's behind Cloudflare's network.
When Cloudflare protects a website, the actual IP address of the web server is hidden. Yet, CloudUnflare provides the real IP and lets you directly access the web server to fulfill your scraping needs.
How CloudUnflare Works
While CloudUnflare hasn't disclosed its whole inner modus operandi, we can grasp it from its reconnaissance report. Generally, it leverages diverse techniques to gather information and then analyzes this data to uncover the real IP address behind Cloudflare.
When you pass a domain name to the tool, it starts by checking the associated subdomains. Next, it proceeds to look for any CNAME records, which are DNS entries that map one domain name to another (like an alias).Â
Additionally, it uses the CompleteDNS API to get historical information about the domain's name servers. By combining these techniques and more, CloudUnflare can uncover the actual IP address of a Cloudflare-protected website.
How to Use CloudUnflare
CloudUnflare is a Bash script designed for Linux systems. So, if you want to run this tool on Windows, you need a Linux environment.
To get started, create and verify an account on CompleteDNS to explore DNS records.
Then, install the required dependencies (cURL, dig, and WHOIS) with the following command:
apt-get install curl dnsutils whois -y
If the installation requires root access, add the prefix sudo
before the above command to execute with admin privileges.
After that, install CloudUnflare by cloning its GitHub repository.
git clone https://github.com/greycatz/CloudUnflare.git
Then navigate to the CloudUnflare directory and list its content using the ls
command. You should have something like this:
Open cloudunflare.bash
, locate the CompletDNS_Login
variable and edit its value (you'll need your CompleteDNS' API credentials for this). Then, save your edit and exit the text editor. To access this variable directly from your terminal, use the following command:
nano cloudflare.bash
bash cloudunflare.bash
Enter the one you want to scrape and wait for the tool to uncover the actual IP address you're after. For this example, let's investigate a Cloudflare-protected website: g2.com
. Here's what we got:
The result above shows that CloudUnflare encountered an error due to the unavailability of "NS History by CompleteDNS". After running checks on the subdomains, the tool could only provide the IP addresses of the subdomains, some of which belong to Cloudflare.
That probably happens because CloudUnflare is no longer maintained, particularly the viewDNS.info
platform responsible for the IP history data.Â
To be also noted, this web scraping method is unreliable even if the tool still works because it only delivers results in a small amount of cases.
So let's see what's the best CloudUnflare alternative next.
Best CloudUnflare Alternative
ZenRows is a web scraping API that offers a reliable solution to bypass Cloudflare and all other anti-bot systems. You can skip the tedious process of discovering the origin server's IP address and, instead, retrieve the data you want with a single API call.
To use ZenRows, create a free account. You'll get to the Request Builder, select your favorite language and enter your target domain's URL (e.g., https://www.g2.com/
). Check the boxes for "Premium Proxies" and "JS Rendering". That will give you the code to use.
Now, install an HTTP library to make a request. We'll use Python Requests.
pip install requests
Copy the code from ZenRows and run it in your IDE.
import requests
url = 'https://www.g2.com/'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
'url': url,
'apikey': apikey,
'js_render': 'true',
'premium_proxy': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)
This should be your output:
//..
<title>Business Software and Services Reviews | G2</title>
//..
Bingo! You've bypassed your first Cloudflare-protected website.
Conclusion
While reconnaissance tools can provide information about a target domain, using it as a scraping approach is tedious and unreliable. Furthermore, we saw CloudUnflare is no longer maintained. Fortunately, you can use ZenRows as a more effective and scalable alternative.