The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ๐Ÿ˜Ž

How to Bypass Sucuri in 2024

August 11, 2023 ยท 7 min read

Sucuri is one of the reasons you're returned error response codes and "Access Denied" messages when attempting to scrape websites. But today, you'll learn more about it and how to bypass Sucuri's firewall.

Let's get started!

What Is Sucuri

Sucuri is a Web Application Firewall (WAF) designed to protect websites from various online threats, including hacking attempts and DDoS attacks. Unfortunately, that includes all sorts of bot traffic regardless of their intention, which poses a challenge to web scraping.

How Sucuri Works

The Sucuri WAF works as a reverse proxy between an origin web server and incoming traffic. When you send a request to a Sucuri-protected website, then the WAF intercepts, analyzes, and filters it before sending it to the origin server.

Non-human requests are mostly denied access, hence the error message you receive.

The system uses evolving techniques classified under behavioral analysis, signature-based tracking, and heuristics to identify and block bots accordingly. Here are some of its bot mitigation methods:

  • Bot distinction: The Sucuri WAF uses different layers of security, each with advanced algorithms to analyze request patterns and behavior. It matches request attributes to known patterns in human behavior to distinguish bots from natural users.
  • Custom security rules: Sucuri offers its users the flexibility to set up WAF rules. They can enable more protection features, such as 2FA, CAPTCHAs, IP allowlisting, IP rate limiting, etc. That makes it increasingly challenging for web scrapers to execute Sucuri's firewall bypass.
  • Blocklisting: That is another bot mitigation technique. It matches request User Agents to a database of known malicious ones to filter out unwanted traffic.
  • Signature Detection: this technique is about combining insights from a database of known bot signatures with heuristic analysis to detect and block bots.

Sucuri uses many other anti-bot techniques and, like most web security solutions, it doesn't disclose its inner workings. Nonetheless, we can learn how to create a Sucuri firewall bypass.ย 

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

How to Bypass Sucuri

Your best bet to bypass the Sucuri firewall is either emulating human behavior or boycotting its WAF. Let's explore both options using the following solutions.

1. Web Scraping API

A web scraping API like ZenRows is the easiest and most reliable way to bypass Sucuri, as it handles all the technicalities of emulating human behavior for you.

ZenRows is an all-in-one solution to bypass any anti-bot solution, including Sucuri, no matter the level of security and evolution. It comes with features like premium rotating proxies, auto-rotating User Agent, headless browsers, anti-CAPTCHA, and more.

Let's see it in action against a Sucuri-protected web page: FantasySP.

Sucuri protected web page
Click to open the image in full screen

To follow along in this example, sign up to get your free API key. After that, you'll get to the Request Builder, where you'll input your target URL: https://www.fantasysp.com/nba_player_news/Nikola_Jokic/, and check the box for premium proxies to activate the parameter. Also, choose your favorite programming language.

Zenrows Request builder
Click to open the image in full screen

That will generate a request code on the right. Copy it to your IDE.

Now, install the Python Requests library (or any other HTTP request library) using the following command:

Terminal
pip install requests

Your Sucuri firewall bypass script should look like this:

program.py
import requests
 
url = 'https://www.fantasysp.com/nba_player_news/Nikola_Jokic/'
apikey = 'Your API Key'
params = {
    'url': url,
    'apikey': apikey,
    'premium_proxy': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)

Here's the result:

Output
//..
<title>Nikola Jokic Fantasy 2023 Basketball News, Stats, Ranking, Projections, Add/Drop Advice</title>
 
//..
<meta name="twitter:title" value="โ€
 
Conversation with the Finals MVP, Nikola Jokic for NBA Today:">
<meta name="twitter:description" value="'Basketball is not the main thing in my life. It's something that I'm good at.'

Awesome, right? It's that easy to bypass Sucuri with ZenRows.

Compared to traditional proxies, web scraping APIs like ZenRows simplify the scraping process by eliminating the complexities of managing proxies. This includes rotation per request, proxy configuration, maintenance, etc. Using a web scraping API streamlines your workflow, allowing you to focus on extracting the necessary data.ย 

While ZenRows required only the "Premium Proxies" parameter in the example above, it applies multiple scraping techniques by default, like header rotation.

You may also come across other web pages with more advanced anti-bot protection. For such cases, activating parameters like JavaScript rendering and advanced anti-bot bypass in your script will yield your desired result.

The next methods below are less reliable. However, knowing all your possible Sucuri firewall bypass options can be helpful.

2. Direct IP Address to Bypass Sucuri

The WAF acts as a reverse proxy between the origin web server and incoming requests, as stated earlier. When a website uses Sucuri, its actual IP address is hidden but often unprotected. Therefore, by sending requests directly to the origin web server, you can boycott Sucuri and retrieve the data you need.

However, the challenge here lies in uncovering the origin server's IP address. Of course, reconnaissance tools claim to be able to discover these IPs, but they often fail, particularly against advanced anti-bot protection like Sucuri.

These tools generally leverage various techniques to gather information about a target domain that can lead to its actual IP address. They mostly rely on exploiting domain configuration vulnerabilities, which means a proper domain configuration can block this path.

Still, if you want to try out this approach, here are some IP reconnaissance tools you can use to bypass Sucuri, Cloudflare, PerimeterX, and other systems:

3. Randomizing User-Agent

The User Agent (UA) is an HTTP header sent with every request and contains information the target web server uses to tailor its response. If you perform many requests using the same UA, or if it doesn't look genuine or it's outdated, chances are you'll get blocked.

The specific details included in a User Agent string can vary depending on the client making the request, but they generally include the following components:

  • Browser name.
  • Browser version.
  • Rendering engine, such as "Gecko" for Firefox or "WebKit" for Safari.
  • Operating System: like "Windows", "Mac OS X",ย  "iOS", or "Android".
  • Device type: "Desktop", "Mobile", "Tablet," or specific device models, like "iPhone" or "iPad".
  • Other information: language preferences, screen resolution or other capabilities of the client software or device can be found in a User Agent string.

Just emulating a browser's UA is not enough. You need to randomize per request or a specific number of requests. Check out our top list of User Agents for web scraping and our guide on User Agents for Python Requests to get started.

4. Search Engine's Cache to Bypass Sucuri

When search engine bots, like Google, crawl websites for indexing, they typically cache their pages. That presents another opportunity to boycott the Sucuri firewall. Like most anti-bot solutions, Sucuri has an allowlist for search engines, so you can retrieve website content by sending requests directly to their cached pages.

To scrape a webpage cached by Google, make your requests to using this URL pattern:

Terminal
https://webcache.googleusercontent.com/search?q=cache:{website_url}

Remember to replace website_url with your target website. Using the target URL in the example above, the cached URL will look like this:

Terminal
https://webcache.googleusercontent.com/search?q=cache:https://www.fantasysp.com/nba_player_news/Nikola_Jokic/

However, bear in mind scraping cached pages is unreliable because the content might not be up-to-date, is static, and not all websites allow caching to search engines.

Conclusion

Sucuri presents another challenge for web scrapers. We saw some methods to get around this WAF, and the most reliable one turned out to be a web scraping API like ZenRows. Sign up to get 1,000 free API credits and try it out.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.