Web Crawling Webinar for Tech Teams
Register Now

How to Bypass Imperva Incapsula for Web Scraping (2025)

Rubén del Campo
Rubén del Campo
Updated: November 12, 2024 · 6 min read

Does your scraper keep hitting the Imperva anti-bot screen? Incapsula Imperva is among the most popular anti-scraping measures on the internet, meaning bypassing it has become necessary to extract data successfully. 

We've got you covered! We've got you covered! In this guide, you'll learn how to bypass Imperva protection using four different tested and trusted methods:

We'll use Harvey Norman, an Imperva Incapsula-protected website, to show how each method works. But first, let's learn more about the system itself.

What Is Imperva (Incapsula)?

Imperva Incapsula is a web application firewall (WAF) that uses advanced web security measures to protect websites against attacks, such as DDoS, blocking traffic that doesn't seem human.

Unfortunately, that includes all sorts of bots regardless of their intentions. The Imperva firewall acts as an intermediary between your browser/scraper and the target website's server.

Common Imperva Block Page Messages

Imperva typically displays an anti-bot page to block web scraping attempts, similar to other WAFs like Akamai and PerimeterX. If you're scraping with an HTTP client, the block page can return errors like Imperva/Incapsula 403. However, you might also get a response 200 OK status code since the block page itself is a valid HTML response.

Here are common block messages that indicates you've been blocked by Imperva:

  • Incapsula incident ID embedded in an iFrame.
  • Powered by Imperva text returned with a CAPTCHA.
  • x-cdn: Imperva in the request headers.
  • _Incapsula_Resource in the script and iframe tags.
  • subject=WAF Block Page in the response HTML.
  • visid_incap_ and incap_ses in the Set-Cookie header field.
  • X-Iinfo in the response headers.

Bypassing Imperva Incapsula is possible. But first, you need to understand its detection techniques.

How Does Imperva Incapsula Detect Bots?

When a user tries to access an Incapsula-protected website, the WAF receives and analyzes the request before getting the content from the source server. Imperva then returns a trust score based on the results of this analysis.

However, due to advanced bot detection techniques, web scrapers rarely exceed the initial analysis stage. Let's discuss Imperva's detection mechanisms below.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

HTTP Request Analysis

Scanning the request headers is one of Imperva's initial detection methods. Header fields, such as the User-Agent, contain information that tells the server whether a client is a human.

The web application firewall (WAF) scans incoming requests against a database of known bot signatures or based on the website's header policies. Any deviation from the expected header values can result in detection and subsequent blocking. Browsers typically send headers in a specific order. If your request header strings deviate from the expected order, it can expose you as a web scraper.

Additionally, the anti-bot checks your HTTP version. Since most modern browsers rely on HTTP/2 or HTTP/3 protocols, using an outdated one like HTTP 1.0 or 1.1 can signal bot-like activity.

To reduce the chances of detection via HTTP analysis, use the recommended request headers for web scraping. Then, use HTTP clients that support HTTP/2+ protocols.

IP Fingerprinting

Incapsula collects IP data from website visitors and compares it to a known database of malicious IPs. If your address has a history of hostile attacks or is associated with botnets, it'll gain a poor reputation, and subsequent requests from it will be banned.

The anti-bot also analyzes traffic data, such as the source and request rate and frequency, to identify unnatural user behavior. So, sending multiple requests within a short period or regularly violating rate limits can result in an IP ban, which can be temporary or permanent.

Using proxies to mask your IP address can boost your scraping activities. However, avoid IPs from data centers or shared ones, as they have a low reputation.

Behavior-Based Detection Techniques

Behavior-based detection methods involve behavioral analysis performed on the server and client sides.

The server-side behavioral analysis approach involves page navigation checks to monitor page interaction timing, patterns, and frequency. The client-side method checks browser/client-based user interactions, such as mouse clicks and movements, keyboard inputs, scrolling patterns, etc.

Imperva obtains these behavioral data in real-time using obfuscated JavaScript challenges and sends it back to Imperva for analysis. Once Imperva spots unusual behavior patterns, it blocks the request.

You can reduce behavioral detection using headless browser automation tools like Selenium, Playwright, or Puppeteer.

Browser Fingerprinting

Imperva also uses browser fingerprinting as part of its detection techniques to create a unique fingerprint for each client by collecting specific information. Information gathered includes operating system type and version, browser type, vendor, installed plugins, language, hardware concurrency, screen resolution, etc.

Clients typically present slight differences in their fingerprints, which makes each unique. Imperva leverages the differences between these data points to identify each client and fingerprint them for subsequent requests.

The security further scans each fingerprint against a database of known fingerprints, including those of known bots. If your web scraper has fingerprint traits similar to those of known bots, Imperva will block you.

TLS Fingerprinting

TLS (Transport Layer Security) fingerprinting is another detection technique that Imperva uses to analyze and fingerprint server-client communication. TLS fingerprinting starts with a TLS handshake, where the client sends a "Client Hello" message to the server.

During the "Client Hello" phase, the client provides supported parameters, including the TLS version, cipher suites, extensions, digital signatures, etc.

Imperva uses the details in the "Client Hello" message to generate a hash or fingerprint. This fingerprint can then be matched against a database of known fingerprints to identify the client type or detect unusual patterns.

TLS fingerprinting is more advanced than browser fingerprinting. For instance, even if you spoof HTTP headers like the User-Agent to mimic a real browser, the underlying TLS fingerprint often remains unchanged unless explicitly configured using custom TLS bypass libraries.

You now know how Incapsula detects your scraper. Let's see the 4 ways to bypass it.

Method #1: Use a Web Scraping API for Incapsula Bypass

Using a web scraping API is the easiest and most effective way to bypass Imperva Incapsula. It handles the technical aspect of emulating natural user behavior with proxy rotation, JavaScript rendering, and anti-bot auto-bypass features..

ZenRows is one of the top web scraping APIs for extracting data from any website, regardless of the security level or your project's scale. You only need to make a single API call using any programming language, and ZenRows will help you bypass Incapsula Imperva.

Let's see how ZenRows works by scraping an Incapsula-protected website like Harvey Norman.

Sign up to open the ZenRows Request Builder. Input your target URL in the link box and activate Premium Proxies and JS Rendering. Select your programming language (Python, in this case) and choose the API connection mode.

Copy and paste the generated code into your scraper file.

building a scraper with zenrows
Click to open the image in full screen

Since we've selected Python, you'll need to install the Requests library using pip:

Terminal
pip3 install requests

The generated Python code should look like this:

Example
# pip install requests
import requests

url = "https://www.harveynorman.com.au/"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
    "url": url,
    "apikey": apikey,
    "js_render": "true",
    "premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)

Here's the response, showing the website's title with omitted content:

Output
<html>
    <head>
        <!-- ... -->
        <title>Computers, Electrical, Furniture &amp; Bedding | Harvey Norman</title>
        <!-- ... -->
    </head>
    <body>
        <!-- ... -->
    </body>
</html>

Perfect! You just bypassed Imperva Incapsula using ZenRows. 

Exploring different methods is important to get the full picture, even if some options have their own strengths. 

Method #2: Implement Fortified Headless Browsers

This method is suitable if scraping an Incapsula-protected page using headless browser automation tools because of complex automation requirements.

Here's the thing: base headless browsers can render JavaScript and emulate user behavior, but they can't bypass anti-bot measures independently without fortification. 

Open-source fortified headless browsers, such as Playwright Stealth, are available. Although they hide some detectable bot-like characteristics, they still leak some bot-like details and are unreliable, especially when dealing with sophisticated anti-bots like Incapsula.

For example, the previous Incapsula-protected website (Harvey Norman) blocks Playwright despite adding the stealth plugin.

To try it yourself, install Playwright and its stealth plugin. Then download its browser binaries:

Terminal
pip3 install playwright playwright-stealth
playwright install

Now, import those libraries and try accessing the protected page with the following code that screenshots the home page:

Example
# pip3 install playwright playwright-stealth
# playwright install
import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async

async def scraper():
    # launch the Playwright instance
    async with async_playwright() as playwright:
        # launch the browser
        browser = await playwright.chromium.launch()

        # create a new page
        page = await browser.new_page()

        # apply stealth to the page
        await stealth_async(page)

        # navigate to the desired URL
        await page.goto("https://www.harveynorman.com.au/")

        # wait for any dynamic content to load
        await page.wait_for_load_state("networkidle")

        # take a screenshot of the page
        await page.screenshot(path="screenshot.png")

        # close the browser
        await browser.close()


# run the main function
asyncio.run(scraper())

The scraper got blocked with the following Incapsula protection page:

Playwright-Stealth got blocked due to missing patches, such as incomplete fingerprints. How can we fortify Playwright better to bypass Imperva during web scraping? 

That's where the ZenRows Scraping Browser comes in handy. It fortifies your Playwright scraper with essential browser fingerprints and pre-integrated residential proxies, significantly increasing your chances of bypassing the Incapsula anti-bot. It's also highly scalable, running the browser instance in the cloud without extra memory usage from your local machine.

To use it, sign up to load the ZenRows Request Builder. Then, go to the Scraping Browser dashboard and copy your Browser URL.

ZenRows scraping browser
Click to open the image in full screen

Connect Playwright's Chromium over the Chrome DevTools Protocol (CDP) using the copied browser connection URL. Then, screenshot the home page after opening the target URL. Here's the updated Playwright scraper:

Example
# pip3 install playwright
# playwright install
import asyncio
from playwright.async_api import async_playwright
import time

async def main():
    # launch the Playwright instance
    async with async_playwright() as p:

        # set the connection URL
        connectionURL = "wss://browser.zenrows.com?apikey=<YOUR_ZENROWS_API_KEY>"

        # # launch the browser with the connection URL
        browser = await p.chromium.connect_over_cdp(connectionURL)

        # create a new page
        page = await browser.new_page()

        # navigate to the desired URL
        await page.goto("https://www.harveynorman.com.au/")

        # wait for any dynamic content to load
        await page.wait_for_load_state("networkidle")

        # await page.wait_for_load_state("networkidle")
        await page.screenshot(path="screenshot.png")

        # close the browser
        await browser.close()


# run the main function
asyncio.run(main())

The ZenRows-fortified Playwright scraper accesses the protected page as shown in the screenshot below:

Harvey Bypassed
Click to open the image in full screen

That works! Let's move to the other techniques.

Method #3: Scrape Archived or Cached Pages

Anti-bot systems like Imperva Incapsula are typically triggered in real-time. However, you can bypass the protection altogether by scraping your target's archived version, which doesn't have the anti-bot measure.

Although Google Cache has stopped offering cache services, you can still access snapshot versions of websites via Wayback Machines, such as the Internet Archive. This website contains snapshots of various pages on different days and times.

Selecting any of those snapshots brings up a previously accessed page that doesn't open directly through the Incapsula Imperva content delivery network (CDN).

For instance, to scrape the previous target page archive, open Internet Archive. Then, enter the target URL into the search bar at the top and hit Enter. 

You'll see snapshots of different dates highlighted in colored dots. Hover over any of them to load the snapshot times for that day. Select the most recent snapshot date and time to reduce the chance of getting outdated data. Click a snapshot period from the options to load the target website's archive.

Harvey Norman Web Archive
Click to open the image in full screen

The loaded archive returns a snapshot of the protected website, as shown:

Harvey Norman Archive Version
Click to open the image in full screen

Once the above archive loads, copy the snapshot URL from the address bar. Open that URL and extract its data with your scraper. The URL looks something like this:

Example
https://web.archive.org/web/20240920195434/https://www.harveynorman.com.au/

While the above method works sometimes, one limitation is that you might end up with outdated data if the website's content has changed since the last snapshot. The archive website may also implement an anti-bot measure to block your scraper from accessing snapshots.

Another way to bypass Incapsula is to use a smart proxy.

Method #4: Use Smart Proxies to Get Past Incapsula Imperva

Some websites only trigger the Imperva anti-bot if the request comes from a geo-restricted IP, a suspicious one, or when an IP exceeds the permissible request limit.

A proxy routes your request through another IP, making it appear as if it's from a different location or machine. You can use free or premium proxies for web scraping. However, free ones have a short lifespan and are unreliable.

The most reliable proxies for web scraping are premium residential ones. These proxies distribute traffic over a pool of IPs assigned to daily internet users by network providers.

This IP distribution model lets you mimic different users and reduces the chance of hitting an IP-triggered Incapsula anti-bot during web scraping.   Read our guide on the best proxy providers for web scraping to see a list of top options.

The limitation of using only proxies is that you can still get blocked by advanced anti-bot measures, especially those using multiple bot detection techniques beyond IP reputation. You need extra measures to bypass anti-bots.

Conclusion

This step-by-step guide showed you how Incapsula Imperva works and how to bypass it using four approaches:

  • Use a web scraping API: The most reliable method to bypass the Incapsula anti-bot page.
  • Implement a fortified headless browser: Recommended if your scraping task requires complex automation.
  • Scraping the target website's archive: To retrieve content snapshots, which can result in scraping outdated data.
  • Integrate smart proxies: Helps avoid IP-triggered Incapsula CAPTCHA. It doesn't work against advanced Incapsula implementations.

ZenRows, an all-in-one web scraping solution, is the most reliable way to bypass Imperva Incapsula at scale. It offers many benefits and features, including anti-bot bypass, JavaScript rendering, proxy rotation, super-fortified scraping browsers with advanced fingerprint management, and more.

Try ZenRows for free now without a credit card!

Ready to get started?

Up to 1,000 URLs for free are waiting for you