Many websites have anti-bot systems in place to detect and block automated traffic. Here, WhatWaf plays a key role. Identifying the firewall behind and providing bypass insight enables developers to create robust scraping strategies, so you'll learn how to use WhatWaf to bypass Cloudflare and other protections in this article.
Ready? Let's get started.
What Is WhatWaf
WhatWaf is a Python-based tool designed to identify over 70 web application firewalls (WAFs) behind a target domain server and comes with additional features that can be useful in bypassing anti-bot detection for web scraping. For example, a built-in encoder to encode your payloads into the discovered bypasses.
How WhatWaf Works
While WhatWaf hasn't disclosed its inner workings, we can get a grasp on its reports: it starts with GET
requests to gather HTTP responses. Then, they're analyzed and compared against a database of WAF signatures.
That's possible because firewalls can return different HTTP responses depending on their configuration and the specific rules set. So, by matching target responses to a WAF, WhatWaf can determine the server's security system.
In addition, WhatWaf performs a bypass analysis to recommend techniques to circumvent the firewalls, such as payload modification, tampering, obfuscation, etc.
Prerequisites
To get started, ensure you have Python installed. Most Linux OS have it pre-installed, but you can confirm this by running the following command:
python --version
#for version 3.x
python3.x --version
After that, clone WhatWaf's GitHub repository:
git clone https://github.com/Ekultek/WhatWaf.git
That automatically saves the tool's source code, which includes configuration files, in the WhatWaf directory.
Lastly, navigate to the WhatWaf directory and install the necessary dependencies using the following commands:
cd WhatWaf
sudo pip install -r requirements.txt
How to Use WhatWaf
Run the following command to view the arguments we can use with the tool:
sudo ./whatwaf --help
You'll get this:
From the result above, we see that we can pass a single URL to detect its WAF using the -u target URL
format. Let's try it with Hack Yourself First, a security training resource.
sudo ./whatwaf -u https://hack-yourself-first.com/Make/5?orderby=supercarid
Here's the first half of our result:
WhatWaf detected two firewalls: Microsoft's ASP.NET and Cloudflare.ย
Here's the second half of the result, containing potential bypasses:
The descriptions outline different tampering techniques that can be applied to modify the payload in a specific way to facilitate a WAF bypass. We can incorporate them into our scraping strategy to get around the detected firewalls.
However, these bypass techniques aren't foolproof and only work in some situations. Furthermore, WhatWaf bypass analysis doesn't work for websites using advanced protection.ย
Let's try investigating G2, a product review website.
sudo ./whatwaf -u https://g2.com/
Here's the first half of our result. We can see that G2 is behind Cloudflare:
Let's see the result of the bypass analysis. Unfortunately, WhatWaf doesn't succeed in detecting possible bypasses.
That raises the question: What works against advanced WAFs? Read on to find out.
Best WhatWaf Alternative
WhatWaf bypass analysis fails against advanced protection and isn't efficient or scalable. But ZenRows offers the ultimate solution because a single API will prevent detection and retrieve the necessary data. Let's explore ZenRows to scrape G2, where WhatWaf failed.
To follow along, sign up to get your free API key. You'll get to the Request Builder, and it's time to generate the scraping code. Pass in the target URL and check the boxes for Premium Proxies and JS Rendering to set these parameters to true
. In our case, we chose Python as a language.
Install Python Requests using the following command (any other HTTP library also works).
pip install requests
Then, copy the code ZenRows provided and run it in your favorite editor.
# pip install requests
import requests
url = 'https://g2.com/'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
'url': url,
'apikey': apikey,
'js_render': 'true',
'premium_proxy': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)
The page's HTML will print:
//..
<title>Business Software and Services Reviews | G2</title>
//..
Congrats, you've finally bypassed Cloudflare.ย
Conclusion
Knowing how to win over WAFs proves critical for any data extraction project.ย
While WhatWaf can detect what firewall a target website uses, its bypass analysis fails against advanced measures. Fortunately, ZenRows offers an effective and scalable solution. Try it for free.