Many websites use advanced systems to block bot traffic, which includes scrapers. The good news is you'll learn to use Wafw00f to bypass Cloudflare and other firewalls in this article.
Let's dive in!
What Is Wafw00f Used for
Wafw00f is an open-source tool designed to identify and fingerprint Web Application Firewalls (WAFs). It helps you determine if a website is behind a WAF and provides insights into its characteristics to help you bypass the anti-bot system.
How Wafw00f Works
Wafw00f analyzes HTTP responses received from target websites and compares patterns and signatures against a database of known WAFs. However, if that doesn't result in useful information, Wafw00f sends several special requests. If that's also not successful, it uses a special algorithm that guesses, using the previous analysis, the WAF in question.
Here's a structural overview of how Wafw00f works:
- Detecting the presence of a WAF: Wafw00f analyzes response headers, response body, and server response characteristics, amongst other things, to establish the presence of a firewall.
- Signature database: Wafw00f matches the observed server response patterns against a database of signatures and patterns associated with different WAFs to identify what is up against. These signatures include known error messages, headers, and blocking patterns exhibited by different WAFs.
- Fingerprinting: Once a WAF is detected, Wafw00f gathers additional information to bypass the firewall.
- Reporting: Lastly, Wafw00f provides a report containing the name of the WAF, its version, and behavioral patterns.
Let's see how to leverage this WAF fingerprinting tool to bypass WAFs and retrieve the desired data.
Prerequisites
Depending on your operating system, there are different approaches to installing Wafw00f.
Install on Linux
Clone the Wafw00f Github repository using the following command:
git clone https://github.com/EnableSecurity/wafw00f
Then, navigate to the Wafw00f directory:
cd wafw00f
Run the make
command to install the necessary files, then grant execute
permission:
chmod +x setup.py
Lastly, install the Wafw00f setup configuration:
python setup.py install
Now, you can run Waf00f:
wafw00f https://example.com/
Install on Windows
For Windows, download Wafw00f's latest release and extract the configuration files to a desired directory. Within that directory, navigate to Wafw00f and run the Python setup script:
cd wafw00f
python setup.py install
Here's an example of how to run Wafw00f:
python main.py https://example.com
After cloning the GitHub repository and installing it on a Windows system, you can build a docker image if you have Docker installed.
docker build . -t wafw00f
Run it like in the example below:
docker run --rm -it wafw00f https://example.com
How to Use Wafw00f
Now that you're all set up, let's try scraping a website protected by a firewall: G2.
First, run the target website through Wafw00f to determine the WAF and its behavior:
wafw00f https://www.g2.com/
Here's our result:
Congrats, you've identified your first WAF!
From the example above, we see that G2 is behind Cloudflare. Also, the "No WAF detected by generic detection" message indicates that Cloudflare intercepted the requests before redirecting to the G2 URL, which isn't behind any firewall. So, if you can get past Cloudflare, you can retrieve the data you're after.
Getting past Cloudflare requires emulating human behavior using fortified headless browsers and actual browser User Agents in your requests. Alternatively, boycotting Cloudflare involves making requests directly to the target website's IP address.
However, both approaches are unreliable and will require tedious manual work. For example, headless browsers still get blocked, and finding the origin server's IP address can be challenging.
Fortunately, there's a more straightforward method to bypass WAFs. Let's find out how.
Best Wafw00f Alternative
ZenRows is a complete anti-bot bypass toolkit that enables users to get around Cloudflare and any other WAFs. It supports all languages, including Python, Java, NodeJS, Ruby, Go, etc. Let's try to scrape G2, a site protected by Cloudflare, using ZenRows.
Start by signing up to get your free API key. You'll get to the Request Builder page. There, paste the target URL and check the boxes for Premium Proxies and JS Rendering to set the necessary parameters to true
. In this case, we chose Python and the scraper's code is auto-generated.
Now, install Python Requests using the following command (although any HTTP library works).
pip install requests
Lastly, copy the code ZenRows provides and run it in your favorite editor.
# pip install requests
import requests
url = 'https://g2.com/'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
'url': url,
'apikey': apikey,
'js_render': 'true',
'premium_proxy': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)
Here's our result:
//..
<title>Business Software and Services Reviews | G2</title>
//..
Bingo, you've bypassed your first WAF!Â
Conclusion
Bypassing WAFs with Wafw00f presents some unique challenges that may require tedious work and expertise. Fortunately, ZenRows offers an easier and more scalable solution that empowers you to bypass any anti-bot measures.
In other words, it's unnecessary to investigate a website to identify its WAF. Just plug in ZenRows, and you can retrieve the necessary data. Sign up now to get 1,000 free API credits.