Does your scraper keep hitting the Imperva anti-bot screen? Imperva Incapsula is among the most popular anti-scraping measures on the internet, meaning bypassing it has become necessary to extract data successfully.
We've got you covered! In this guide, you'll learn how to bypass Imperva protection using both DIY methods and managed solutions:
- Method #1: Implement fortified headless browsers.
- Method #2: Scrape archived or cached pages.
- Method #3: Use smart proxies to get past Imperva Incapsula.
- Best method: Use a web scraping API.
We'll use Harvey Norman, an Imperva Incapsula-protected website, to show how each method works. But first, let's learn more about the system itself.
What Is Imperva (Incapsula)?
Imperva Incapsula (formerly Incapsula) is a web application firewall (WAF) that uses advanced web security measures to protect websites against attacks, such as DDoS, by blocking traffic that appears to be non-human.
The Imperva firewall acts as an intermediary between your browser/scraper and the target website's server. It screens every request for suspicious activities and only allows trusted traffic to proceed. Unfortunately, it also specifically implements web scraping defenses and an advanced bot management system to block bots and web scraping-related activities.
Despite being one of the oldest WAFs, Imperva has improved significantly since 2023, releasing sophisticated bot-detection methods to clamp down on new and emerging bot techniques. This makes bypassing it increasingly challenging for web scrapers. Let's see how you can identify it during scraping.
Common Imperva Block Page Messages
Imperva typically displays an anti-bot page to block web scraping attempts, similar to other WAFs like Akamai and PerimeterX. If you're scraping with an HTTP client, the block page can return errors, such as Imperva/Incapsula 403. However, you might also receive a 200 OK status code, as the block page itself is a valid HTML response.
Here are common block messages that indicate you've been blocked by Imperva:
Incapsula incident IDembedded in an iFrame.Powered by Impervatext returned with a CAPTCHA.x-cdn: Impervain the request headers._Incapsula_Resourcein the script and iframe tags.subject=WAF Block Pagein the response HTML.visid_incap_andincap_sesin the Set-Cookie header field.X-Iinfoin the response headers.
Bypassing Imperva Incapsula is possible. But first, you need to understand its detection techniques.
How Does Imperva Incapsula Detect Bots?
When a user tries to access an Incapsula-protected website, the WAF receives and analyzes the request before getting the content from the source server. Imperva then returns a trust score based on the results of this analysis.
However, due to advanced bot detection techniques, web scrapers rarely exceed the initial analysis stage. Let's discuss Imperva's detection mechanisms below.
TLS Fingerprinting
TLS (Transport Layer Security) fingerprinting is one of the first detection techniques used by Imperva Incapsula before the server fully establishes a secure connection with a client. Imperva uses TLS fingerprinting to analyze and fingerprint server-client communication, which starts with a TLS handshake, where the client sends a "ClientHello" message to the server.
Imperva then employs techniques like JA3 and JA4 to analyze the parameters in the TLS handshake (particularly the "ClientHello" message) to generate a unique hash or fingerprint for different clients. During the "ClientHello" phase, the client provides supported parameters, including the TLS version, cipher suites, extensions, digital signatures, and more.
These parameters can then be matched against a database of known fingerprints to identify the client type or detect unusual patterns. Additionally, since web scraping tools tend to use different ciphers and encryption from real clients, they can easily be detected through TLS fingerprinting.
Because TLS handshake relies on JA3 and the more advanced JA4 hashing techniques, spoofing TLS fingerprints can be more challenging than evading basic browser fingerprinting. For instance, even if you spoof HTTP headers like the User-Agent to mimic a real browser, the underlying TLS fingerprint often remains unchanged unless explicitly configured using custom TLS bypass libraries.
HTTP Request Analysis
Scanning the request headers is also one of Imperva's initial detection methods. Header fields, such as the User-Agent, contain information that tells the server whether a client is a human.
The web application firewall (WAF) scans incoming requests against a database of known bot signatures or based on the website's header policies. Any deviation from the expected header values can result in detection and subsequent blocking. Browsers typically send headers in a specific order. If your request header strings deviate from the expected order, it can expose you as a web scraper.
Additionally, the anti-bot checks your HTTP version. Since most modern browsers rely on HTTP/2 or HTTP/3 protocols, using an outdated one like HTTP 1.0 or 1.1 can signal bot-like activity.
To reduce the chances of detection via HTTP analysis, use the recommended request headers for web scraping. Then, use HTTP clients that support HTTP/2+ protocols.
IP Fingerprinting
After a secure connection is established, Incapsula further collects IP data from website visitors and compares it against a known database of malicious IP addresses. If your address has a history of hostile attacks or is associated with botnets, it'll gain a poor reputation, and subsequent requests from it will be banned.
The anti-bot also analyzes traffic data, such as the source, request rate, and frequency, to identify unnatural user behavior. So, sending multiple requests within a short period or regularly violating rate limits can result in an IP ban, which can be temporary or permanent.
Rotating proxies to mask your IP address can boost your scraping activities. However, avoid IPs from data centers or shared ones, as they have a low reputation.
Behavior-Based Detection Techniques
Using advanced machine learning techniques, Imperva also employs behavior-based detection methods to analyze the user's interaction patterns on both the server and client sides.
The server-side behavioral analysis approach involves page navigation checks to monitor page interaction timing, patterns, and frequency. The client-side method checks browser/client-based user interactions, such as mouse clicks and movements, keyboard inputs, scrolling patterns, etc.
Imperva obtains these behavioral data in real-time using obfuscated JavaScript challenges and sends it back to Imperva's AI system for analysis. Once Imperva spots unusual behavior patterns, it blocks the request.
Although Imperva's behavioral algorithm is smart, you can reduce its detection using headless browser automation tools such as Selenium, Playwright, or Puppeteer.
Browser and JavaScript Fingerprinting
Imperva also uses browser fingerprinting as part of its detection techniques to create a unique profile for each client by collecting specific information. It also uses JavaScript fingerprinting to analyze a client's unique ability to execute specific client-side scripts.
Some of the information gathered during JavaScript and browser fingerprinting includes the following:
- The operating system type and version.
- JavaScript engine information.
- Browser type.
- Browser vendor.
- Installed plugins.
- Supported language.
- Hardware concurrency.
- Screen resolution.
- Navigator properties.
Clients typically exhibit slight differences in these fingerprints, making each unique. Imperva leverages the differences between these data points to identify each client and fingerprint them for subsequent requests.
The security further scans each data point against a database of known fingerprints, including those of known bots. If your web scraper has fingerprint traits similar to those of known bots, Imperva will block you.
You now know how Incapsula detects your scraper. Let's see the ways to bypass it.
Manual and Open-Source Incapsula Bypass Methods
Many web scrapers are familiar with Imperva, but traditional methods of bypassing it can quickly become obsolete. That said, there are still a few DIY Imperva Incapsula bypass techniques you can use with your scraper.
Method #1: Implement Fortified Headless Browsers
This method is suitable if the Incapsula-protected requires complex automation, and you're currently scraping it with a headless browser automation tool.
Here's the thing: while standard headless browsers can render JavaScript and emulate user behavior, they can't bypass anti-bot measures independently without fortification. Open-source fortified headless browsers, such as SeleniumBase with Undetected ChromeDriver, Playwright Stealth, and Puppeteer Stealth, are available.
We'll demonstrate how SeleniumBase with Undetected ChromeDriver works in Python by scraping the same Incapsula-protected website (Harvey Norman).
First, install the library using pip:
pip3 install seleniumbase
Import the Driver class from SeleniumBase and instantiate the WebDriver in headless and Undetected ChromeDriver modes. Open the target site with a 4-second reconnection delay to allow the site to load. Finally, take a screenshot of the site:
# pip3 install seleniumbase
from seleniumbase import Driver
# initialize driver with UC mode enabled
driver = Driver(uc=True, headless=True)
# open URL using UC mode with a 4-second reconnect time to bypass initial detection
driver.uc_open_with_reconnect(
"https://www.harveynorman.com.au/",
reconnect_time=4,
)
# retrieve and print the page source after bypassing anti-bot measures
driver.save_screenshot("screenshot.png")
# close the browser and end the session
driver.quit()
We got the following result, showing that SeleniumBase bypasses the Imperva Incapsula anti-bot:
However, this method is still unreliable, as open-source, stealth tools carry a risk of low maintenance and still leak some bot signals. This limitation makes them easily detectable by modern anti-bot measures like Imperva, which are constantly updated.
That said, there are other techniques you can try.
Method #2: Scrape Archived or Cached Pages
Anti-bot systems, such as Imperva Incapsula, are typically triggered in real-time. However, you can bypass its protection altogether by scraping your target's archived version, which doesn't have the anti-bot measure.
Although Google Cache has stopped offering cache services, you can still access snapshot versions of websites via Wayback Machine, such as the Internet Archive. This website features snapshots of various pages taken on different days and times.
Selecting any of those snapshots opens a previously accessed page that doesn't load directly through the Incapsula Imperva content delivery network (CDN).
For instance, to scrape the previous target site's archive, open the Internet Archive. Then enter its URL in the search bar at the top and press Enter.
You'll see snapshots of different dates highlighted in colored dots. Hover over any of them to load the snapshot times for that day. Select the most recent snapshot date and time to reduce the chance of getting outdated data. Click a snapshot period from the options to load the target website's archive.
The loaded archive returns a snapshot of the protected website, as shown:
Once the archive above has loaded, copy the snapshot URL from the address bar. Open that URL and extract its data with your scraper. The URL looks something like this:
https://web.archive.org/web/20240920195434/https://www.harveynorman.com.au/
While the above method sometimes works, a limitation is that you may end up with outdated data if the website's content has changed since the last snapshot. The archive website may also implement an anti-bot measure to block your scraper from accessing snapshots.
Another way to bypass Incapsula is to use a smart proxy.
Method #3: Use Smart Proxies to Get Past Incapsula Imperva
Some websites only trigger the Imperva anti-bot if the request comes from a geo-restricted IP address, a suspicious one, or when an IP exceeds the permissible request limit.
A proxy routes your request through another IP address, making it appear as if it's from a different location or machine. You can use free or premium proxies for web scraping. However, free ones have a short lifespan and are unreliable.
The most reliable proxies for web scraping are premium residential ones. These proxies distribute traffic across a pool of IPs assigned to daily internet users by network providers. This IP distribution model lets you mimic different users and reduces the risk of triggering the Incapsula anti-bot during web scraping.
Here's a basic setup for an authenticated premium proxy using Python's Requests library:
# pip3 install requests
import requests
# specify your proxy details
proxy = "http://<PROXY_USERNAME>:<PROXY_PASSWORD>@<PROXY_ADDRESS>:<PROXY_PORT>"
# configure HTTP and HTTPS proxies
proxies = {
"http": proxy,
"https": proxy,
}
url = "https://httpbin.io/ip"
# make a request through the proxy
response = requests.get(url, proxies=proxies)
# ...other scraping activities
Read our guide on the best proxy providers for web scraping to see a list of top options.
However, the limitation of using only proxies is that you can still get blocked by other detection techniques deployed by Imperva Incapsula. Combining proxies with other techniques can increase the likelihood of success.
Best Method: Use ZenRows for Imperva Bypass
The easiest and most reliable way to bypass Imperva is via a web scraping solution, such as the ZenRows Universal Scraper API. This approach is highly recommended if you want to avoid the technical complexities and risks of the custom bypass methods discussed above.
Under the hood, ZenRows handles the technical aspect of emulating natural user behavior with proxy rotation, JavaScript rendering, and anti-bot auto-bypass features. It also provides an auto-managed, auto-scaled infrastructure that smartly adapts to Imperva's evolving anti-bot techniques. This allows you to focus on other tasks, such as data refinement, storage, and decision-making logic, rather than wasting time and resources fixing failed scrapers.
With ZenRows, you only need to make a single API call in any programming language, and ZenRows will help you bypass Imperva.
Let's see how ZenRows works by scraping an Incapsula-protected website like Harvey Norman.
Sign up and go to the ZenRows Request Builder. Input your target URL in the link box and activate Premium Proxies and JS Rendering.
Select your programming language (in this case, Python) and choose the API connection mode. Copy and paste the generated code into your scraper file.
The generated Python code should look like this:
# pip install requests
import requests
url = "https://www.harveynorman.com.au/"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
"url": url,
"apikey": apikey,
"js_render": "true",
"premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)
Here's the response, showing the website's title with omitted content:
<html>
<head>
<!-- ... -->
<title>Computers, Electrical, Furniture & Bedding | Harvey Norman</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
</body>
</html>
Congratulations! 🎉 You just bypassed Imperva Incapsula using ZenRows.
Troubleshooting Common Imperva Blocks: When Manual Methods Fail
When implementing manual Incapsula/Imperva bypass techniques, you may encounter additional blocks. The table below summarizes the troubleshooting steps to fix these blocks:
| Imperva Block Issue | Possible Cause | Quick Fixes |
|---|---|---|
| 403 Forbidden Error | Request received but denied due to bot-like signals detected by anti-bot. | - Use realistic browser headers (User-Agent, vendor, referrer) - Patch stealth browsers (e.g., adjust navigator to hide automation signals)- Use clients that support modern TLS (JA3/JA4) - Switch to residential proxies with IP rotation - Retry requests (e.g., exponential backoff) |
| 429 Too Many Requests | Rate limit exceeded by sending too many requests at once. | - Limit concurrent requests - Add random delays - Schedule scraping during off-peak hours - Rotate IP addresses per request |
| Unsupported IP/Location | Request blocked due to originating from an unsupported country/location. | - Use residential proxies with geo-location features - Switch to a supported country’s IP |
| Intermittent Blocks | Blocks occur due to expired cookies or a degraded trust score. | - Randomize user interactions - Generate random browser fingerprints with libraries like Ghost Cursor - Change IP address - Regularly test your stealth tool's reliability and switch stealth tools if needed |
| CAPTCHA Block Page | Blocked by Incapsula CAPTCHA challenge. | - Solve with CAPTCHA-solving tools like 2Captcha - Try to automate solving with stealth headless browsers like SeleniumBase |
Considering that each recommended custom technique has its own strengths, it's important to combine them to increase the likelihood of success.
However, using a web scraping API, such as ZenRows, remains the singular solution for bypassing Imperva at scale with zero self-management or manual infrastructure setup.
Conclusion
This guide showed you how Incapsula Imperva works, including four approaches for bypassing it. While the open-source and free methods can help you evade simple anti-bot measures, they don't guarantee against advanced anti-bot measures.
ZenRows, an all-in-one web scraping solution, is the most reliable way to bypass Imperva Incapsula at scale. ZenRows enables you to focus on the vital parts of your business process by handling time-consuming tasks, such as anti-bot bypass, proxy rotation, and JavaScript rendering for you.
Try ZenRows for free now or speak with sales!
Frequent Questions
What is the Error Code for Imperva Blocks?
The error code displayed when blocked by Incapsula anti-bot varies, but is typically the 403 forbidden error. However, note that you might still receive a valid 200 response code from an Incapsula-protected page if you encounter a CAPTCHA, as such pages return a valid HTML response.
What is the Best Method to Bypass Imperva Incapsula?
The most effective way to bypass Imperva Incapsula is to use a web scraping API. These are more reliable than open-source or free solutions since they're automatically maintained and auto-scaled to match your scraping requirements.
Is it Legal to Scrape an Incapsula-Protected Website?
Before scraping an Incapsula-protected website, ensure you carefully review and comply with the site's policies and any applicable legal agreements. Avoid scraping private or sensitive data and use the extracted data responsibly. Overall, ensure you follow best practices for web scraping.
What is an Incapsula block?
An Imperva Incapsula block occurs when Imperva's anti-bot measures detect a request as suspicious or automated. It then triggers some underground checks, such as JavaScript challenges and visible verification steps like CAPTCHA, to verify the request's authenticity. If the request fails these challenges, Imperva then displays a block page to prevent access to the target site. An Incapsula block aims to identify and block bots, scrapers, and unauthorized access to protected website resources.