The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ๐Ÿ˜Ž

How to Bypass WAF in 2024: Challenges and Solutions

May 9, 2023 ยท 9 min read

How many times has a site blocked the requests made by your web scraper? Most of the time, that's due to WAFs, application-level firewalls that come with several defense systems to block undesired traffic.

They rely on several advanced techniques, but keep calm! There's always a way to get around them. Here, you'll learn how to bypass WAF protections and scrape any site.

Let's learn how!

What Does WAF Mean?

WAF stands for Web Application Firewall and is a collection of security tools to protect a site from several attacks and threats. A WAF operates at the application layer of the OSI model. It analyzes HTTP requests and applies a set of rules to identify and block suspicious traffic.

When it comes to web scraping, WAFs represent a major obstacle. That's because scraping requests usually appear as an attack on a WAF, especially when extracting data at a high rate or in large volumes.

Is It Possible to Bypass a WAF?

Yes, WAF bypass is possible. But since WAFs use many security techniques, there isn't a one-size-fits-all solution. These are several of the most popular protection methods you need to know how to bypass:

  • IP Address Reputation: Blocking requests that come from IPs marked as unreliable or dangerous. You can avoid that with a web scraping proxy.
  • CAPTCHAS: Problems shown on the web pages that are easy to solve for humans but complex for bots. A CAPTCHA proxy will help you circumvent them.
  • Honeypots: Traps for bots embedded in web pages that are invisible to human users. Learn more on how to bypass a honeypot.
  • User behavior analysis: Tracking user activity on a web page to determine whether it's a bot. Prevent that by making your scraping bot simulate a human with a headless browser.
  • Device fingerprinting: Looking for hardware and software features only a real user's device usually has. Dig into browser fingerprinting and how to win over it.

What WAF Do I Need to Bypass?

Knowing which WAF your target site relies on is crucial to building an effective web scraper. Follow the steps below to learn how to identify a WAF:

  1. Explore your target site in the browser.ย 
  2. Look for evident protection methods: Most popular WAFs inform users of what is going on when applying strict anti-bot measures. These control pages usually feature the provider's name, which is essential to know.
G2 Verification
Click to open the image in full screen

Note the "Performance & security by Cloudflare" footer. There, you can also check whether the site uses security solutions such as CAPTCHAs.

  1. Analyze the HTTP headers: Open the DevTools, interact with the site, and inspect the HTTP response headers of the requests made by the browser. WAFs tend to make specific AJAX calls and set special headers and cookies. For example, Cloudflare sets the cf_clearance cookie.
cf_clearance Cookies
Click to open the image in full screen

If none of the steps above helped you figure out the WAF, try to perform an automated request to your target page. Use an HTTP client and make a GET request. The response produced by the server may provide useful data:

Server Response
Click to open the image in full screen

Take a look at the /cdn-cgi/ route for images. That's typical of Cloudflare and also true for the presence of the _cf_chl_opt object.

cf_chl_opt_ Object
Click to open the image in full screen
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The most popular WAF vendors on the market are few. And because they all adopt different anti-bot protection technologies and security policies, we created specific guides to help you bypass each of them:

  • Cloudflare: It offers a suite of security solutions to protect sites from various types of attacks. Around 20% of all internet websites use it. Learn how to bypass Cloudflare.
  • Akamai: It takes advantage of machine learning to block attacks in real time. You'll find how to bypass Akamai here.
  • PerimeterX: It exploits behavioral analysis to protect web applications from many threats. Check out our guide to find step-by-step instructions on how to bypass PerimeterX.
  • DataDome: It provides a bot detection technology to prevent automated attacks. You can bypass DataDome with the help of our detailed tutorial.
  • Imperva: Imperva uses advanced security measures to protect websites against attacks, such as DDoS, with near-zero false positives and a global SO. If you want to sucesfully bypass Imperva, check out our guide

No matter what technology your target site employs, you can bypass its WAF with a full-featured web scraping API such as ZenRows.

Techniques to Bypass WAF

Let's see the best tips and methods to bypass WAF defenses.

1. Use Residential IPs

An effective solution to avoid IP banning is making HTTP requests through a proxy server. These typically offer:

  • Data center IPs: Addresses coming from data centers and not associated with any ISP.
  • Residential IPs: Addresses assigned by ISPs to real devices in a specific location.
  • Mobile IPs: Addresses assigned by mobile carriers to individual devices. They're useful for scraping websites that show different content for mobile users.

Datacenter IP addresses are cheap, but most WAFs can spot them without effort. Residential IPs are, instead, much more reliable. Read our guide of the best proxy providers for web scraping to see a list of great options.

2. Run Fortified Headless Browsers

Headless browsers are a great tool for scraping sites that need JavaScript. Yet, their primary purpose is to build automated tests. Thus, they don't try to hide and may set special headers or variables, which helps WAFs recognize their requests.

There are some libraries to override that default behavior. For example, undetected_chromedriver patches Selenium to make it ready for scraping, and puppeteer-extra-plugin-stealth does the same for Puppeteer and Playwright. They can help you with your WAF bypass.

3. Web Scraping API

A scraping API like ZenRows is a great alternative to avoid getting blocked by WAFs. This technology provides premium proxies and implements sophisticated anti-bot techniques, eliminating all headaches.

Suppose you want to build a web scraping script in Python to extract data from the G2 review page of Asana, which is protected by Cloudflare: https://www.g2.com/products/asana/reviews


If you visit it with Selenium, you'll get the following 403 Forbidden page error:

G2 Access Denied
Click to open the image in full screen

Now, sign up to ZenRows to receive 1,000 free API credits. Get to the Request Builder page and paste the URL of the target page. Then, check "JavaScript Rendering", "Anti-bot" and "Premium Proxy". Additionally, wait for the .l2 CSS Selector.

ZenRows Dashboard
Click to open the image in full screen

You'll get the Python code below for the Proxy mode:

program.py
import requests

url = "https://www.g2.com/products/asana/reviews"
proxy = "http://<YOUR_ZENROWS_API_KEY>:js_render=true&[email protected]:8001"
proxies = {"http": proxy, "https": proxy}
response = requests.get(url, proxies=proxies, verify=False)
print(response.text)

This time, you'll get a 200 response. Bye-bye, 403 error!

Wow! With a single API call, you get rid of all the protections of the most powerful WAF provider on the market!

4. Call the Origin Server

The idea behind a WAF is to create a firewall network and protect the content inside it. But what if you could pass through it and directly contact the origin server to circumvent all defenses?

First, use services like Shodan or tools like CloudFlair to get the IP address of the host server. Then, forge some requests to make them appear as coming from a valid domain name and contact the server.

Keep in mind that this technique isn't always possible because finding the origin server IP is hard for most targets. Also, faking the right requests takes time and effort.

5. Use WAF Solvers

These are tools or services claiming to be able to bypass WAF challenges. They work by analyzing the protection methods and modifying the HTTP traffic accordingly. Some popular WAF solvers are:

  • BypassWAF: It tries to overcome firewalls by looking for old DNS A records and verifying if the origin server replies to that domain.
  • Cfscrape: An open-source Python module that allows you to go around Cloudflare protection.ย 
  • Cloudscraper: A Python library to avoid the Cloudflare waiting room, also known as "I'm Under Attack Mode" (IUAM).

Most of these solutions work for a limited time as they aren't maintained or kept up to date.

6. Reverse Engineer the JavaScript Challenge

JavaScript challenges involve code injected on the page by the WAF. The browser runs the snippet and transparently overcomes it. If the scraper can't solve the test, it gets marked as a bot and blocked.

That's one of the most common protections used by WAFs. Here's why you need to know how to bypass it. The only way you have it is to analyze the injected snippet, then reverse engineer the JavaScript code and study how the challenge works.

For a real-world example, check out our guide on how to address the Cloudflare "waiting room" challenge.

7. Get Around CAPTCHAs

CAPTCHAs are a common tool websites use to prevent bots from accessing their content. To get around them, you have two solutions:

  • CAPTCHA solving services: These are usually expensive and failure-prone.ย 
  • Prevent them from appearing: Proxies offers tools to help you avoid them in the first place.

In most cases, the second option is the most effective solution. Consult our guide on the best CAPTCHA proxies to learn more.

8. Don't Fall Into Honeypot Traps

Honeypot traps involve fake pages, links, or forms that human users can't see but are visible to bots. When an automated visitor interacts with them, the website can detect and ban them.

Make your bot smart to avoid falling into such traps. Don't click on non-visible links or fill out hidden form fields. Ignore interaction with display: none HTML elements, and make sure your target page is a real one.

For more info, read our article on what a honeypot trap is and how to bypass it.

9. Bypass Browser Fingerprinting

Browser fingerprinting is about gathering information on a user's browser. The idea is to harness that data to uniquely identify a user and limit the number of requests allowed. Visit our article on browser fingerprinting to find out more about how it works.

A particular approach to this WAF method is canvas fingerprinting. A script forces the browser to generate an image using the user's browser specifications as parameters. Different computers will render different canvas images, making the user easy to spot based on that.

10. Get Around TLS Fingerprinting

TLS fingerprinting is a method to recognize a user by studying the parameters exchanged during the TLS handshake between client and server.

WAFs observe and record all TLS connections. They track who initiates a conversation with the server and decide whether to block or allow the request. Explore how to bypass WAF using this technology in our guide on TLS fingerprinting.

11. Understand Event Tracking

WAFs keep collecting data about the user. They observe when, how, and what elements you interact with. By looking for known patterns, they can tell whether you're human.

A way to hinder this protection is to program your bot to behave as naturally as possible with the help of a headless browser.

Conclusion

You learned a lot about how to bypass WAF protection in this article:

  • What WAF is, and which are the most common providers.
  • Whether it's possible to get around it.
  • What the most common defenses used by these technologies are.

The techniques presented here are only part of the entire arsenal available to WAFs. Plus, these protective measures keep evolving, and circumventing them becomes more difficult every day. Finding workarounds and keeping them up-to-date takes too much time and effort.

The solution? An all-in-one solution web scraping API like ZenRows. This cutting-edge tool offers the best WAF bypassing capabilities on the market. With a single API call, you get the content you want and forget about all anti-scraping systems.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.