In-Depth Guide to Bypass Incapsula Imperva: 4 Methods

Rubén del Campo
Rubén del Campo
October 2, 2024 · 6 min read

Does your scraper keep hitting the Imperva anti-bot screen? Incapsula Imperva is among the most popular anti-scraping measures on the internet, meaning bypassing it has become necessary to extract data successfully. 

We've got you covered! In this guide, you'll learn how to bypass Incapsula (now Imperva) using four different tested and trusted methods:

We'll use Harvey Norman, an Imperva Incapsula-protected website, to show how each method works. But first, let's learn more about the system itself.

How Does Imperva Incapsula Detect Bots?

Imperva Incapsula is a web application firewall (WAF) that uses advanced security measures to protect websites against attacks, such as DDoS, blocking traffic that doesn't seem human. Unfortunately, that includes all sorts of bots regardless of their intentions.

The Imperva firewall acts as an intermediary between your browser/scraper and the target website's server. When a user tries to access an Incapsula-protected website, its WAF receives and analyzes the request before requesting the content from the source server.

However, scrapers rarely make it beyond the analysis stage due to three kinds of detection approaches: signature, behavioral analysis, and client fingerprinting. These checks will block your automated script, resulting in errors like Imperva/Incapsula 403.

Here's what the Imperva Incapsula anti-bot page looks like:

Incapsula Antibot Block Page
Click to open the image in full screen
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Bypassing Imperva Incapsula is possible. But first, you need to understand its detection mechanisms. Let's discuss them below.

Signature-Based Detection Techniques

Signature-based detection methods rely on predefined patterns, or signatures, to identify bots from humans. Here are the common ones:

  • HTTP request headers: Every request's HTTP headers contain information that tells whether its sender is a human. For instance, if you use a command line tool (e.g., cURL) or a headless browser, you'll be easily identified as a bot.
  • IP reputation: Incapsula collects IP data from website visitors and compares it to a known database of malicious IPs. If your address has a history of hostile attacks or is associated with botnets, it'll gain a poor reputation, and subsequent requests from it will get blocked.

Behavior-Based Detection Techniques

Behavior-based techniques involve behavioral analysis performed on the server side. They include:

  • Request analysis: The anti-bot analyzes traffic data, such as the source, rate, and frequency of requests, to identify unnatural user behavior. For instance, your bot will look suspicious and get blocked if you send too many requests at an unusual rate.
  • Page navigation analysis: This technique monitors page interaction timing, patterns, and frequency. This way, Incapsula can identify unusual navigation patterns and block corresponding requests.
  • Client interaction analysis: User interactions, like mouse clicks and keyboard inputs, say a lot about the client, so Imperva Incapsula obtains this type of data using obfuscated scripts and blocks suspicious requests.

Client Fingerprinting Techniques

Client fingerprinting involves analyzing clients' characteristics to identify bots. Let's elaborate with some examples:

  • Device fingerprinting: This technique identifies the user's device. Imperva Incapsula gathers information about the client's attributes (OS, browser type and version, screen resolution, installed fonts, etc.) and combines it to generate a unique user fingerprint.
  • JavaScript challenges: Since the client's inability to render JavaScript is a clear sign of bot traffic, Incapsula uses various challenges to test JavaScript execution.
  • CAPTCHAs: These are challenges designed to let humans through and keep bots out. CAPTCHAs often come as puzzles, asking the user to complete an action before accessing a page. They were the leading bot detectors, but they're now being replaced or merged with advanced technologies for more robust protection.

You now know how Imperva Incapsula detects your scraper. Let's see the 4 ways to bypass it.

Method #1: Use a Web Scraping API

Using a web scraping API is the easiest and most effective way to bypass Imperva Incapsula. It handles the technical aspect of emulating natural user behavior with proxy rotation, JavaScript rendering, and anti-bot auto-bypass features.

ZenRows is one of the top scraping APIs for extracting data from any website, regardless of the anti-bot security level or your project's scale. You only need to make a single API call using any programming language, and ZenRows will help you bypass Incapsula Imperva.

Let's see how ZenRows works by scraping an Incapsula-protected website like Harvey Norman.

Sign up to open the ZenRows Request Builder. Input your target URL in the link box and activate Premium Proxies and JS Rendering. Select your programming language (Python, in this case) and choose the API connection mode.

Copy and paste the generated code into your scraper file.

building a scraper with zenrows
Click to open the image in full screen

Since we've selected Python, you'll need to install the Requests library using pip:

Terminal
pip3 install requests

The generated Python code should look like this:

Example
# pip install requests
import requests

url = "https://www.harveynorman.com.au/"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
    "url": url,
    "apikey": apikey,
    "js_render": "true",
    "premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)

Here's the output, showing the website's title with omitted content:

Output
<html>
    <head>
        <!-- ... -->
        <title>Computers, Electrical, Furniture &amp; Bedding | Harvey Norman</title>
        <!-- ... -->
    </head>
    <body>
        <!-- ... -->
    </body>
</html>

Perfect! You just bypassed Imperva Incapsula using ZenRows. 

Exploring different methods is important to get the full picture, even if some options have their own strengths. 

Method #2: Implement Fortified Headless Browsers

This method is suitable if scraping an Incapsula-protected page using a headless browser because of complex automation requirements.

Here's the thing: base headless browsers can render JavaScript and emulate user behavior, but they can't bypass anti-bot measures independently without fortification

Open-source fortified headless browsers, such as Playwright Stealth, are available. Although they hide some detectable bot-like characteristics, they still leak some bot-like details and are unreliable, especially when dealing with sophisticated anti-bots like Incapsula.

For example, the previous Incapsula-protected website (Harvey Norman) blocks Playwright despite adding the stealth plugin.

To try it yourself, install Playwright and its stealth plugin. Then download its browser binaries:

Terminal
pip3 install playwright playwright-stealth
playwright install

Now, import those libraries and try accessing the protected page with the following code that screenshots the home page:

Example
# pip3 install playwright playwright-stealth
# playwright install
import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async

async def scraper():
    # launch the Playwright instance
    async with async_playwright() as playwright:
        # launch the browser
        browser = await playwright.chromium.launch()

        # create a new page
        page = await browser.new_page()

        # apply stealth to the page
        await stealth_async(page)

        # navigate to the desired URL
        await page.goto("https://www.harveynorman.com.au/")

        # wait for any dynamic content to load
        await page.wait_for_load_state("networkidle")

        # take a screenshot of the page
        await page.screenshot(path="screenshot.png")

        # close the browser
        await browser.close()


# run the main function
asyncio.run(scraper())

The scraper got blocked with the following Incapsula protection page:

Harvey Scraper Blocked
Click to open the image in full screen

Playwright-Stealth got blocked due to missing patches, such as incomplete fingerprints. How can we fortify Playwright better to bypass that block? 

That's where the ZenRows Scraping Browser comes in handy. It fortifies your Playwright scraper with essential browser fingerprints and pre-integrated residential proxies, significantly increasing your chances of bypassing the Incapsula anti-bot. It's also highly scalable because it runs the browser instance in the cloud without adding memory to your local machine.

To use it, sign up to load the ZenRows Request Builder. Then, go to the Scraping Browser dashboard and copy your Browser URL.

ZenRows scraping browser
Click to open the image in full screen

Connect Playwright's Chromium over the Chrome DevTools Protocol (CDP) using the copied browser connection URL. Then, screenshot the home page after opening the target URL. Here's the updated Playwright scraper:

Example
# pip3 install playwright
# playwright install
import asyncio
from playwright.async_api import async_playwright
import time

async def main():
    # launch the Playwright instance
    async with async_playwright() as p:

        # set the connection URL
        connectionURL = "wss://browser.zenrows.com?apikey=<YOUR_ZENROWS_API_KEY>"

        # # launch the browser with the connection URL
        browser = await p.chromium.connect_over_cdp(connectionURL)

        # create a new page
        page = await browser.new_page()

        # navigate to the desired URL
        await page.goto("https://www.harveynorman.com.au/")

        # wait for any dynamic content to load
        await page.wait_for_load_state("networkidle")

        # await page.wait_for_load_state("networkidle")
        await page.screenshot(path="screenshot.png")

        # close the browser
        await browser.close()


# run the main function
asyncio.run(main())

The ZenRows-fortified Playwright scraper accesses the protected page as shown in the screenshot below:

Harvey Bypassed
Click to open the image in full screen

That works! Let's move to the other techniques.

Method #3: Scrape Archived or Cached Pages

Anti-bot systems like Imperva Incapsula are typically triggered in real time. However, you can bypass the protection altogether by scraping your target's archived version, which doesn't have the anti-bot measure.

Although Google has stopped offering cached pages, you can still access snapshot versions of websites via Wayback Machines, such as the Internet Archive. This website contains snapshots of various pages on different days and times.

Selecting any of those snapshots brings up a previously accessed page that doesn't open directly through the Incapsula Imperva content delivery network (CDN).

For instance, to scrape the previous target page archive, open Internet Archive. Then, enter the target URL into the search bar at the top and hit Enter. 

You'll see snapshots of different dates highlighted in colored dots. Hover over any of them to load the snapshot times for that day. Select the most recent snapshot date and time to reduce the chance of getting outdated data. Click a snapshot period from the options to load the target website's archive.

Harvey Norman Web Archive
Click to open the image in full screen

The loaded archive returns a snapshot of the protected website, as shown:

Harvey Norman Archive Version
Click to open the image in full screen

Once the above archive loads, copy the snapshot URL from the address bar. Open that URL and extract its data with your scraper. The URL looks something like this:

Example
https://web.archive.org/web/20240920195434/https://www.harveynorman.com.au/

While the above method works sometimes, one limitation is that you might end up with outdated data if the website's content has changed since the last snapshot. Additionally, the archive website may implement an anti-bot measure to block your scraper from accessing snapshots.

Another way to bypass Incapsula is to use a smart proxy.

Method #4: Use Smart Proxies to Get Past Incapsula Imperva

Some websites only trigger the Imperva Incapsula anti-bot if the request comes from a geo-restricted IP, a suspicious one, or when an IP exceeds the permissible request limit.

A proxy routes your request through another IP, making it appear as if it's from a different location or machine. You can use free or premium proxies. However, free ones have a short lifespan and are unreliable. 

The most reliable proxies for scraping are premium residential ones. These proxies distribute traffic over a pool of IPs assigned to daily internet users by network providers. 

This IP distribution model lets you mimic different users and reduces the chance of hitting an IP-triggered Incapsula anti-bot during scraping.

Read our guide on the best proxy providers for web scraping to see a list of top options.

The limitation of using only proxies is that you can still get blocked by advanced anti-bot measures, especially those using multiple bot detection techniques beyond IP reputation. You need extra measures to bypass anti-bots.

Conclusion

This step-by-step guide showed you how Incapsula Imperva works and how to bypass it using four approaches:

  • Use a web scraping API: The most reliable method to bypass the Incapsula anti-bot page.
  • Implement a fortified headless browser: Recommended if your scraping task requires complex automation.
  • Scraping the target website's archive: To retrieve content snapshots, which can result in scraping outdated data.
  • Integrate smart proxies: Helps avoid IP-triggered Incapsula CAPTCHA. It doesn't work against advanced Incapsula implementations.

ZenRows, an all-in-one web scraping solution, is the most reliable way to bypass Imperva Incapsula at scale. It offers many benefits and features, including anti-bot bypass, JavaScript rendering, proxy rotation, super-fortified scraping browsers with advanced fingerprint management, and more.

Try ZenRows for free now without a credit card!

Ready to get started?

Up to 1,000 URLs for free are waiting for you