3 Ways to Bypass PerimeterX With Playwright

September 10, 2024 · 4 min read

Table of contents

How does PerimeterX work?
- IP filtering
- Request header analysis
- Browser fingerprinting
- Behavioral analysis
Best ways to bypass PerimeterX
- Use Playwright Stealth plugin
- Get premium proxies
- Use Brave as a browser
Conclusion

Does your Playwright scraper keep getting blocked by the "Press & Hold to confirm you're a human (and not a bot)" message?

The PerimeterX anti-bot, a sophisticated Web Application Firewall (WAF), is working against your scraper! Unfortunately, Playwright can't bypass PerimeterX on its own.

But we're here to help you. In this article, you'll learn how PerimeterX works and the three best methods to bypass it while scraping with Playwright.

How Does PerimeterX Work?

PerimeterX (now merged with HUMAN Security) is an advanced anti-bot measure that defends websites and apps against bots and malicious activities, such as account fraud, advertising fraud, and other client-side threats.

Unfortunately, web scrapers fall within the bot category. So, if you scrape a PerimeterX-protected website, there's a high chance its security measures will block you.

PerimeterX efficiently filters bots under the hood without impacting human users. It employs server-side and client-facing techniques to identify automated requests. Let's discuss some of these briefly.

IP Filtering

PerimeterX scans the IPs of incoming requests against a pool of bot-like IP addresses. These include IPs from datacenter proxies, VPNs, and IPs with bad bot reputation scores reported by other anti-bot systems. The PerimeterX anti-bot measure will block your request if you've used an IP that falls within any of these categories. During large-scale web scraping, the anti-bot may also block an IP that sends heavy traffic. It then implements mitigation strategies like machine learning to flag that IP, preventing it from accessing the protected website in subsequent requests.

Request Header Analysis

When dealing with an anti-bot measure like PerimeterX, your request headers can expose you as a bot. PerimeterX analyzes the request headers for bot-like signals, such as header mismatches between the User Agent's platform string and the platform header, missing header information, bot-like User Agents like Playwright's Headlesschrome flag, and more. Generally, you're more likely to get blocked if your request header doesn't correlate with those sent by an actual browser.

Browser Fingerprinting

Browser fingerprinting is another bot detection technique PerimeterX uses to gather extensive client information, such as User Agent data, hosting platform type and version, installed extensions, navigator properties, and more. For example, PerimeterX detects Playwright's bot-like fingerprints, such as the WebDriver. As an advanced measure, PerimeterX also compares the collected fingerprints with those in a database of known fingerprints and uses machine learning to identify deviations and detect bots.

Behavioral Analysis

Mouse movements, clicks, form filling, navigation patterns, and browsing sessions tell much about a user's behavior. Humans and bots have different behavioral patterns differently when interacting with a website. For instance, opening thousands of browsing sessions fast or filling out a registration form in under a few milliseconds is typical of automation tools like Playwright. PerimeterX deploys measures to block such deviations from a human usage pattern.

Due to their automation features, all headless browsers are vulnerable to these detection techniques. For example, let's see how Playwright performs against Zillow, a PerimeterX-protected website.

Try it out with the following script that takes a screenshot of the target site's homepage:

                    Example
                
# import the required libraries
import asyncio
from playwright.async_api import async_playwright

async def scraper():
    async with async_playwright() as p:
        # launch the Chromium browser
        browser = await p.chromium.launch(headless=False)

        # open a new browser context
        context = await browser.new_context()

        # open a new page
        page = await context.new_page()

        # navigate to the target site
        await page.goto('https://www.zillow.com/')

        # take a screenshot of the homepage
        await page.screenshot(path='zillow_homepage_screenshot.png')

        # close the browser
        await browser.close()

# run the scraper function asynchronously
asyncio.run(scraper())

  
  

  
Copied!

The above Playwright scraper accessed the target website for the first few requests but later got blocked because PerimeterX flagged and blocked the request's IP. See the output screenshot below with the PerimeterX CAPTCHA:

Zillow Homepage — Click to open the image in full screen

So, how can you bypass PerimeterX bot detection techniques and access your desired data with Playwright?

Avoid getting blocked with headless browsers

ZenRows unlocks all the data you need by mimicking human behavior, loading dynamic content, and interacting with any webpage.

Try for Free

Best Ways to Bypass PerimeterX With Playwright

PerimeterX combines the previously listed bot detection methods to block scrapers effectively. Even if you escape IP filtering and request header analysis, you're unlikely to get past browser fingerprinting, behavioral analysis, and other background JavaScript challenges.

In this section, you'll learn the three best methods to bypass PerimeterX and scrape without getting blocked.

Method #1: Use Playwright Stealth Plugin

The Playwright Stealth plugin is a helper for bypassing anti-bot detection while scraping with Playwright. When added to your Playwright scraper, the plugin patches the automated browser with human-like fingerprints.

For example, the Stealth plugin hides Playwright's automated WebDriver property, overrides the HeadlessChrome in the User Agent with an actual Chrome flag, and modifies the Chrome instance to run as if it's in the GUI mode even if you've used the headless mode. All these enhance your chances of bypassing fingerprinting and request header checks.

Read our detailed tutorial on using Playwright Stealth for scraping to learn more about its implementation.

However, even the Stealth plugin isn't foolproof against detection, as it still doesn't cover all the detection techniques used by PerimeterX. Additionally, the anti-bot receives consistent security updates, making it difficult for the Stealth plugin to keep up with new detection methods.

That said, there's still more you can do to evade PerimeterX detection.

Method #2: Get Premium Proxies

A proxy sends requests on your behalf, making it look like you're requesting from a different location. Using proxies with Playwright helps mask your IP address and prevent IP bans due to rate limiting or geolocation restrictions.

You can use free or premium proxies for web scraping. Free ones are available on websites like the Free Proxy List. However, due to their short lifespan, free proxies are only good for testing rather than for real-life projects.

The best option to guarantee a high success rate is auto-rotating premium proxies. These are even more reliable for bypassing rate-limited IP bans during large-scale scraping because they rotate your IP from a pool containing millions of real users' IP addresses. This feature makes it look like each request is from a different user.

One of the best premium proxy providers on the market is ZenRows, a web scraping toolkit that offers Residential Proxies with auto-rotation and geo-targeting. When you opt for a premium proxy subscription on ZenRows, you gain access to all the tools you need to avoid getting blocked, including anti-bot auto-bypass, anti-CAPTCHA, JavaScript rendering, and more.

To integrate the proxy service, first sign up to ZenRows.

Then, copy your proxy credentials from the ZenRows Proxies Generator:

generate residential proxies with zenrows — Click to open the image in full screen

Then, include your credentials in the previous Playwright script, as shown:

                    Example
                
# import the required libraries
import asyncio
from playwright.async_api import async_playwright

async def scraper():
    async with async_playwright() as p:

        # launch the Chromium browser
        browser = await p.chromium.launch(
            headless=False,
            proxy={
               'server': 'superproxy.zenrows.com:1337',
               'username': '<YOUR_ZENROWS_PROXY_USERNAME>',
               'password': '<YOUR_ZENROWS_PROXY_PASSWORD>',
            },
        )

        # open a new browser context
        context = await browser.new_context()

        # increase the default timeout to wait for content to load
        context.set_default_timeout(60000)

        # open a new page
        page = await context.new_page()

        # navigate to the target site
        await page.goto('https://www.zillow.com/')

        # take a screenshot of the homepage
        await page.screenshot(path='zillow_homepage_success_screenshot.png')

        # close the browser
        await browser.close()

# run the scraper function asynchronously
asyncio.run(scraper())

  
  

  
Copied!

The above code returns the following screenshot, confirming access to the target site's homepage:

Zillow Homepage Bypass Success — Click to open the image in full screen

Congratulations! You just bypassed PerimeterX by integrating ZenRows proxy with your Playwright scraper.

Because ZenRows rotates your IP, you now have lower chances of getting blocked.

Method #3: Use Brave as a Browser

The Brave browser has privacy features for blocking trackers, fingerprinting, underground scripts, and many more. You can leverage this to avoid detection when scraping with Playwright.

The process involves pointing Playwright to your Brave browser executable and using it to run your Playwright sessions. Let's see how it works using the same target website (Zillows).

First, grab your Brave browser's executable path. Its installation defaults to the following directory on Windows. However, your machine's path may differ depending on your installation settings. You may also need to show hidden folders/files to make it visible in this directory:

                    Example
                
C:/Program Files/BraveSoftware/Brave-Browser/Application/brave.exe

Copied!

The path may default to the following paths on Linux. But as mentioned, this may be different, depending on your installation settings:

                    Example
                
# if installed via apt:
~/.config/BraveSoftware/Brave-Browser

# if installed via snap:
~/snap/bin/brave

Copied!

Now, open your Python file, import the required libraries, and start an asynchronous scraper function. Next, specify your Brave executable file path:

                    Example
                
# import the required libraries
import asyncio
from playwright.async_api import async_playwright

async def scraper():
    async with async_playwright() as p:
       
        # specify the path to the Brave executable
        # replace with the actual path to your Brave executable
        brave_path = 'C:/Program Files/BraveSoftware/Brave-Browser/Application/brave.exe'

Copied!

Point Playwright to the Brave executable path, start a new browser context, open a new page, and go to the target website. Then, take a screenshot of its homepage. Finally, close the browser and execute the scraper function using asyncio:

                    Example
                
# ...

async def scraper():
    async with async_playwright() as p:
 
        # ...
       
        # launch the browser using the specified Brave executable
        browser = await p.chromium.launch(executable_path=brave_path, headless=False)
       
        # open a new browser context (tab)
        context = await browser.new_context()
       
        # open a new page
        page = await context.new_page()
       
        # navigate to the target page
        await page.goto('https://www.zillow.com/')
       
        # take a screenshot of the homepage
        await page.screenshot(path='zillow_access_homepage.png')
       
        # close the browser
        await browser.close()

# run the scraper function asynchronously
asyncio.run(scraper())

Copied!

Here's the full code after combining both snippets:

                    Example
                
# import the required libraries
import asyncio
from playwright.async_api import async_playwright

async def scraper():
    async with async_playwright() as p:

        # specify the path to the Brave executable
        # replace with the actual path to your Brave executable
        brave_path = 'C:/Program Files/BraveSoftware/Brave-Browser/Application/brave.exe'

        # launch the browser using the specified Brave executable
        browser = await p.chromium.launch(executable_path=brave_path, headless=False)
       
        # open a new browser context (tab)
        context = await browser.new_context()
       
        # open a new page
        page = await context.new_page()
       
        # navigate to the target page
        await page.goto('https://www.zillow.com/')
       
        # take a screenshot of the homepage
        await page.screenshot(path='zillow_access_homepage.png')
       
        # close the browser
        await browser.close()

# run the scraper function asynchronously
asyncio.run(scraper())

  
  

  
Copied!

The above code forces Playwright to use Brave as its mainstream browser, giving it a higher chance of avoiding the PerimeterX CAPTCHA.

Conclusion

While Playwright Stealth and Brave can help you bypass browser fingerprinting and request header analysis, premium proxies prevent IP restrictions. Auto-rotating premium proxies are the best solution for large-scale web scraping, as they save you the trouble of manual setup and allow you to scrape more efficiently.

Combining these methods will give the best result.

If you're ready to try out premium Residential Proxies, check out ZenRows. On top of auto-rotating proxies, it gives you all the features you need to bypass any anti-bot system at scale at a competitive price.