The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ūüėé

How to Bypass PerimeterX in 2023: The 6 Best Methods

May 9, 2023 · 16 min read

You found a website you want to scrape, coded your scraper, and ran it, only to realize PerimeterX has blocked you. You're not alone in this struggle!

PerimeterX (now called HUMAN) uses sophisticated server and client-side techniques to identify and block bots like your web scraper. However, you'll be able to bypass PerimeterX and retrieve the data you need by following the methods outlined in this article.

Here are some of the approaches that'll get you through:

But first, let's learn how this firewall works.

How Does PerimeterX Work

PerimeterX was one of the first companies out there providing security services for websites when it was founded back in 2004 (six years before Cloudflare!). So they know what they're doing when it comes to blocking bots.

PerimeterX uses both passive and active bot detection. Passive Bot detection refers to checks they do on their servers once they receive a request from a visitor. Active Bot detection refers to scripts they run on the agent of the visitor to gather information and detect bots.

What's more, its detection system claims to protect websites from bots with minimal impact on the user experience. In other words, it tries not to bother human users with captcha solving or a waiting screen for authentication unless they are suspicious the request comes from a bot.

PerimeterX Bot Detection Techniques

The below list is not exhaustive, but it covers the most aggressive defenses PerimeterX deploys. We'll touch on how they work and then will focus on how to overcome them.

1. IP Filtering

Security companies like PerimeterX usually have huge lists of IPs known to be used by bots. They also can identify groups of IPs that belong to data centers, proxies, or VPN providers. Web Application Firewalls (WAF) usually assign some score or reputation to each IP that tries to access the protected website. If the IP your bot is using has a bad reputation, you will probably get blocked.

2. Checking HTTP Request Headers

Lots of bots use libraries or other non-browser agents like python-requests or Axios. These agents usually don't send some of the headers that typical browsers add to their requests. This is one of the simplest ways anti-bot systems like PerimeterX Bot Defender use to identify and block bots. Luckily, it's easy to add HTTP headers to your requests to bypass this protection.

3. Behavioral Analysis

PerimeterX is very proud of its use of Machine Learning algorithms for behavioral analysis, which allows it to identify bots based on their behavior. For example, their system has learned that IPs that make hundreds of requests in a short amount of time are usually bots. When they detect this type of behavior, they usually block access to a protected web page.

4. Fingerprinting and Blacklisting

Some of the methods we mentioned, like Behavioral Analysis or Checking HTTP request headers, can be combined with others, like TLS fingerprinting, to identify visitors even if they use different IPs. Once the Web Application Firewalls (WAF) identify the visitor as a bot, they add it to a Black list to prevent their access on future visits.

To learn some anti-block techniques against Passive Bot Detection, check out our article on how to do web scraping without getting blocked.

If, after applying the techniques for bypassing passive bot defenses, PerimeterX is still detecting you, probably its active bot script is what is detecting your bot. If you're ready to create a PerimeterX captcha bypass, prepare yourself to get your hands dirty with some obfuscated Javascript code and reverse engineering strategies.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Method #1: Employ Smart Proxies to Bypass PerimeterX

You can use smart proxies to handle anti-bot challenges and return the necessary data or session cookies to access the content you need.

They aim to mimic human-like behavior by rotating residential proxies, randomizing User Agents, and emulating natural patterns. It's all behind the scenes, so you don't have to worry about writing long lines of code.

Smart proxies offer a higher level of anonymity compared to their standard counterparts. Therefore, their traffic is mostly indistinguishable from natural human traffic. That, amongst other factors, makes them a great tool for bypassing the PerimeterX bot detection system. 

ZenRows is an example of a smart proxy that empowers users to bypass PerimeterX and any bot detection system. You can imitate human behavior and solve anti-bot challenges by specifying your target URL.

ZenRows supports all programming languages, including Python, Java, Node.js, Go, Ruby, etc. Here's a quick example using Python:

# pip install requests
import requests

url = 'https://www.ssense.com/en-ca'
apikey = 'Your_API_Key'
params = {
    'url': url,
    'apikey': apikey,
	'antibot': 'true',
	'premium_proxy': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)


print(response.text)
# ....<title>Luxury fashion &amp; independent designers | SSENSE Canada</title>...

Method #2: Use Fortified Headless Browsers

While headless browsers were initially designed for testing, they've evolved into essential web scraping tools. However, they possess automation traits that make them easily identifiable by anti-bot systems like PerimeterX. A common one is the navigator.webdriver property, for instance.

That being said, the most popular headless browsers, like Selenium, Puppeteer, and Playwright, come with solutions that enable you to fortify your web scraper:

These tools have proven helpful over the years, yet they're open-source. That means they might not keep up with constantly evolving bot management systems like PerimeterX.

But, even if they could, there are still downsides. For example, they consume significant CPU, memory, and bandwidth resources. So, headless browsing will inevitably result in scraping costs and performance issues.

Although you can take measures like blocking resources to improve performance, not loading them may also flag you as a bot.

Method #3: Use an API for PerimeterX Bypass

At this point, maybe you're thinking, "Isn't there any existing PerimeterX bypass that is reliable?".

The harsh reality is that in 2023 it's tough to bypass PerimeterX anti-bot service using public software, like the libraries you can find on GitHub. However, some of them, like Puppeteer Stealth, are worth checking out.

Also, standard headless browsers based on Chrome, Chromium, Firefox, or Selenium need very specific configurations to work. Furthermore, because the source code of such software is public, PerimeterX developers can update their anti-bot system to detect it.

One option is to code your own PerimeterX bypass, although the easiest way is to use private software designed for the job. One reliable example is ZenRows. 

Our team of coders has invested countless hours of work in developing an API for web scraping. It effectively avoids anti-bot systems, like PerimeterX, and you can get your free API key right away.

Method #4: PerimeterX CAPTCHA Bypass

PerimeterX might display CAPTCHAs as part of the challenges you have to pass to access website content. Sometimes they're only presented when suspicious activities, such as too many requests, are detected. That gives you two approaches to bypassing them:

  • Preventing CAPTCHAs from being triggered.
  • Solving them when they appear.

The first one is the recommended approach, as it's more reliable at scale and a lot cheaper. Solutions like fortified headless browsers, smart proxies, etc., can help you fly under the radar.

On the other hand, when CAPTCHAs are presented, you have to solve them. That's only possible through paid CAPTCHA-solving services like 2Captcha. They employ real people to solve the challenges manually and return the solution, performed using the service's API endpoint.

Method #5: Scrape Google Cache

When Google crawls websites for indexing, it caches their pages. Thus, we can go over the target websites and ask Google for these pages directly. However, this method is only viable if the data you're after doesn't change regularly. Also, as not all websites allow caching, this may not work.

To scrape a website's cached data, send a request to its Google cached URL. That typically follows the format below:

<https://webcache.googleusercontent.com/search?q=cache:{website_url}>

Method #6: Reverse Engineer PerimeterX to Bypass It

One way to bypass the PerimeterX Bot Defender is to reverse engineer its checks and challenges. These are the steps:

  1. Analyze the network logs.
  2. Deobfuscate the PerimeterX JavaScript challenge script.
  3. Analyze the deobfuscated script and the subsequent checks.

How to Create a PerimeterX Bypass

It's essential to understand the firewall internals to reverse engineer it. We'll use mainly JavaScript, but the techniques in this tutorial will allow you to code your PerimeterX bypass in Python or any other language.

In our example, we'll analyze the anti-bot implementation on SSENSE. This website is a good example because many e-commerce sites use PerimeterX.

Ready?

Step 1: Analyze the Network Log

First, open up the developer tools for the web browser of your choice and switch to the "Network" tab.

Next, leave the developer tools open and navigate to SSENSE. As the page loads, you'll notice many requests appearing in the Network log. The important ones to take note of, in chronological order, are as follows: As the page loads, you'll notice many requests appear in the network log. These are the important ones to take note of, in chronological order:

An initial GET request to https://www.ssense.com/en-ca. Looking at the response, you'll see a Set-Cookie header for _pxhd. This is an important cookie: it acts as a session indicator and will also be used in future requests. Your PerimeterX bypass will need some data from this cookie to calculate the correct values that will be sent for validation to the server.

GET Request
Click to open the image in full screen

Check also that the response body's HTML contains a <script> tag, which fetches the PerimeterX challenge script:

<script type="text/javascript"> 
	(function () { 
		window._pxAppId = "PX58Asv359"; 
		if (window._pxAppId) { 
			var p = document.getElementsByTagName("script")[0], 
				s = document.createElement("script"); 
			s.async = 1; 
			s.src = "/" + window._pxAppId.substring(2) + "/init.js"; 
			p.parentNode.insertBefore(s, p); 
		} 
	})(); 
</script>

A GET request to /<_pxAppId>/init.js (where <_pxAppId> is the value of window._pxAppId). This returns the script PerimeterX uses for client-side bot detection. It's obfuscated and minified, so you won't be able to understand much for now. Click here to see the entire script.

Initial Request
Click to open the image in full screen

Then, a POST request to /<_pxAppId>/xhr/api/v2/collector happens. The request payload is a string with content-type application/x-www-form-urlencoded, and contains the following data:

  • <payload>¬†is an encrypted and Base64 encoded string.
  • <appId>¬†is the previously defined value of¬†window._pxAppId.
  • <tag>¬†is a version tag (static per site), ex.¬†v8.0.2.
  • <uuid>¬†is a randomly generated UUID, ex.¬†4420aff0-351d-11ed-95d0-c137f4896ca9.
  • <ft>¬†is an integer (static per site), ex.¬†278.
  • <seq>¬†has the value¬†0.
  • <en>¬†has the value¬†NTA.
  • <pc>¬†is an integer, ex.¬†3195683956001701.
  • <pxhd>is the value of the¬†_pxhd¬†cookie.
  • <rsc>¬†has the value¬†1.
Collector Request
Click to open the image in full screen

The response body is a JSON object with a single top-level field: do. The do field contains an array of strings. The format is as follows:

{ 
	"do": [ 
		"sid|<sid>", // a string, ex. 4415dfc2-351d-11ed-a66d-7275714f5843 
		"pnf|cu", 
		"cls|<cls>", // an integer, ex. 85062563435994268828 
		"sts|<sts>", // is a UNIX timestamp, ex. 1663263533114 
		"wcs|<wcs>", // a string, ex. cchm6ba3onsi8miotj00 
		"drc|<drc>", // an integer, ex. 4460 
		"cts|<cts>|true", // a string, ex. 4415e33e-351d-11ed-a66d-7275714f584 
		"cs|<cs>", // a SHA2-256 hash, ex. dd2d5dc601445d684b2c4249a4c68f300048446afd4fece93c44ae41f62bdda3 
		"vid|<vid1>|<vid2>|true", // a string and an integer, ex. 43c15b2f-351d-11ed-97ec-797549415148 and 31536000 
		"sff|cc|60|<sff>" // a base64-encoded string, ex. U2FtZVNpdGU9TGF4Ow== 
	] 
}

And a second POST request to /<_pxAppId>/xhr/api/v2/collector. The payload has the same content-type as before and a similar format with a few added fields:

  • <payload>¬†is a much longer, encrypted + Base64 encoded string.
  • <appId>,¬†<tag>,¬†<uuid>,¬†<ft>¬†and¬†<pxhd>¬†are the same as the previous request.
  • <seq>¬†has the value¬†1.
  • <en>¬†has the value¬†NTA.
  • <cs>¬†is a SHA2-256 hash, ex.¬†dd2d5dc601445d684b2c4249a4c68f300048446afd4fece93c44ae41f62bdda3
  • <pc>¬†is an integer, ex.¬†1670315818019117
  • <sid>¬†is a string, ex.¬†4415dfc2-351d-11ed-a66d-7275714f5843
  • <vid>¬†is a string, ex.¬†43c15b2f-351d-11ed-97ec-797549415148
  • <cts>¬†is a string, ex.¬†4415e33e-351d-11ed-a66d-7275714f5843
  • <rsc>¬†has the value¬†2.

If you take a closer look, you'll see that the cs, sid, vid and cts fields are derived directly from the JSON object returned from the first POST request.

Additionally, the value of the seq and rsc has incremented by 1, relative to the first POST request. This behavior is maintained for all following POST requests too, so we can determine that these fields act as some sort of request counter.

PerimeterX sends another JSON object in the response body, once again containing an array of strings:

{ 
	"do": ["bake|_px2|330|<jwt>|true|300", "pnf|cu"] 
	 // where jwt is a JWT Token, ex. eyJ1IjoiNDQyMGFmZjAtMzUxZC0xMWVkLTk1ZDAtYzEzN2Y0ODk2Y2E5IiwidiI6IjQzYzE1YjJmLTM1MWQtMTFlZC05N2VjLTc5NzU0OTQxNTE0OCIsInQiOjE2NjMyNjM4MzQxMjIsImgiOiIwNzUzZDJhYTU1OWEzZDFhYjM5YjcyOGFmZDA0MDUyYWFlNDQ2MmU1NjMxNjZkNjM4MjM0NjZkNmNjMzIwY2ZlIn0= 
}

You may have noticed that none of the POST requests contains a Set-Cookie response header. Typically, once a browser has passed bot-detection checks, an anti-bot system will set special cookies or headers for use in future requests. Then, once a client makes a request to a protected endpoint, those headers/cookies from the request get validated on the server side.

So, how does this work in the case of PerimeterX? If you make a request to an endpoint protected by PerimeterX, you won't see any unusual headers. You will, however, notice what seems to be some PerimeterX-related cookies. For a cleaner overview, you can view all the cookies on the site and filter by the keyword px:

Cookies
Click to open the image in full screen

These are PerimeterX's clearance cookies. They are checked on the server side to determine if a request should be blocked or forwarded to the origin. But remember, there's no record of these cookies being set with the Set-Cookie header in the network log. So, where are they coming from?

You might recognize the cookie names and values from the response bodies of the POST requests. This must mean that the cookies are being set directly through JavaScript, which makes sense considering all the PerimeterX cookies lack an Http-Only flag.

Okay, great job! By analyzing the requests, we learned a lot about how PerimeterX behaves. Unfortunately, we're still missing a lot of information. We still don't know what data is contained in the encrypted payload field, how some other fields are generated, and what client-side bot detection checks the script performs. If you want to bypass PerimeterX, that knowledge is crucial.

If we want to answer those remaining questions, we have no choice but to directly consult the PerimeterX challenge script to figure out exactly how it works.

Step 2: Deobfuscate the PerimeterX JavaScript Challenge

To make the script unreadable to reverse engineers, PerimeterX applies obfuscation to their Javascript challenge. Here's a non-exhaustive list of some examples:

String Concealing. This technique replaces all references to string literals with a call to a decoder function. In the case of PerimeterX, strings are either Base64 encoded or additionally encrypted with a simple XOR cipher.

// String concealing example from the PerimeterX script 
 
// Creates an empty lookup cache for use in the decoding function 
var o = Object.create(null); 
 
/* ... */ 
 
// XOR Decryptor function 
// Returns the decoded string. 
// This function references some external variables and functions. 
// The n() and r() functions are related to recording timestamps, and are irrelevant to the decoding function. 
// The i() function is a polyfill function for atob (base64 decoding) 
// The o variable is defined earlier in the script as a cache. 
 
function c(t) { 
	// n() is irrelevant to the decoding 
	var a = n(), 
		e = o[t]; 
	if (e) u = e; // Try to look up the decoding string in the cache 
	else { 
		// i() is a polyfill function for atob 
		// Base64 decodes the input string 
		var c = i(t); 
 
		var u = ""; 
		// XOR decryption 
		for (var f = 0; f < c.length; ++f) { 
			var A = "dDqXfru".charCodeAt(f % 7); 
			u += String.fromCharCode(A ^ c.charCodeAt(f)); 
		} 
		// Store the result in the cache 
		o[t] = u; 
	} 
	return r(a), u; // r(a) is irrelevant to the decoding. 
} 
 
/* ... */ 
 
// Later on in the script, it's used like this: 
 
c("NBxAaVZGQg"); // => "PX11047"

Proxy Variables/Functions. This technique replaces direct references to a variable/function's identifier with an intermediate variable.

/* Proxy function example */ 
 
// Decoding function from above 
function c(t) { 
	/* ... */ 
} 
 
// Intermediate variable declaration 
var r = c; 
 
// Calling r() instead of c() directly 
r("NBxAaVZERw"); // => "PX11062" 
 
/* Proxy variable example */ 
 
// Intermediate variable for the identifier "window" 
var F = window; 
 
// Referencing "F" instead of "window" directly 
F.performance.now();

Unary Expressions. Rather than directly using boolean literals or the undefined keyword, this technique takes advantage of the automatic type-conversion behavior of JavaScript's unary expression implementation.

var o = !0; // equivalent to o = true 
var c = !1; // equivalent to c = false 
void 0 === this.channels; // equivalent to undefined === this.channels

Though the PerimeterX challenge script's obfuscation may not be as sophisticated as that of other bot detection vendors, it still requires specialized reverse-engineering skills to convert it to a readable state. Simply pasting it in a general JavaScript deobfuscator won't produce easily understandable code.

To deobfuscate the PerimeterX script, you'll need to create a custom deobfuscator. This step can be difficult, but it's essential for creating a PerimeterX bypass!

Once you've deobfuscated the PerimeterX challenge script, you can read it to determine what bot detection checks are performed and how to replicate the challenge-solving behavior. In the next step, we're going to go over the deobfuscated script and try to extract critical information about its internals.

Step 3: Analyze the Deobfuscated PerimeterX Script

Let's start by figuring out how the payload is encrypted!

PerimeterX's Payload Encryption

To figure out how the payload is encrypted, so we can code our custom PerimeterX bypass we're going to work backward. First, we find where it's set by searching for the string "payload=" in the deobfuscated script:

var B = { 
	vid: cn, 
	tag: ff.Bn, 
	appID: ff.J, 
	cu: Uo, 
	cs: f, 
	pc: A, 
}; 
var N = Wc(n, B); 
var l = [ 
	"payload=" + N, 
	"appId=" + ff.J, 
	"tag=" + ff.Bn, 
	"uuid=" + Uo, 
	"ft=" + ff.Nn, 
	"seq=" + Uu++, 
	"en=NTA", 
];

The final value for payload is stored in the variable N. Looking at the definition of N, we can determine that the Wc function is responsible for payload encryption. Wc takes in two parameters:

  • n: a JavaScript object that stores the raw payload data.
  • B: a JavaScript object that stores some values used as keys in the encryption process.

Let's look up the definition of Wc:

var B = { 
var Wc = function (n, r) { 
	var t; 
	var a = n.slice(); 
	t = nc || "1604064986000"; 
	var e = zr(Un(t), 10); 
	var i = z(a); 
	a = Un(zr(i, 50)); 
 
	var c = (function (n, r, t) { 
		var a, e, i, o, c; 
		var u = zr(Un(t), 10); 
		var f = []; 
		var A = -1; 
 
		for (var B = 0; B < n.length; B++) { 
			/* ... */ 
		} 
 
		for (var v = 0; n.length > v; v++) { 
			/* ... */ 
		} 
 
		return f.sort(function (n, r) { 
			return n - r; 
		}); 
	})(e, a.length, r[Hc]); 
 
	a = (function (n, r, t) { 
		/* ... */ 
		return (a += r.substring(e)); 
	})(e, a, c); 
 
	return a; 
};

This is PerimeterX's encryption cipher. The original function is quite long and references many external variables/functions. For the sake of practicality, we've truncated it.

However, there are some important things you can learn about this cipher by looking at the fully deobfuscated PerimeterX script:

  • The payload uses two encryption keys: the values of¬†uuid¬†and¬†sts.
  • uuid¬†appears in every¬†POST¬†request, while¬†sts¬†appears in the 2nd¬†POST¬†request onwards. In the case of the 1st¬†POST¬†request, where¬†sts¬†is absent,¬†"1604064986000"¬†is used in place of it.
  • This is a¬†symmetric-key algorithm. Therefore,¬†as long as you have the original¬†sts¬†and¬†uuid¬†values, you can decrypt any encrypt PerimeterX's payload. This is useful for analyzing the payload that your browser sends since the keys are always sent in the¬†POST¬†request along with the encrypted content.

How PerimeterX Sets Cookies

We previously concluded that all PerimeterX-related cookies were set by the actual script itself. Recall that the raw value of the _px2 cookie first appeared inside of a JSON-formatted response body (as <jwt>):

{ 
	"do": ["bake|_px2|330||true|300", "pnf|cu"] 
}

The field name, do, actually turns out to be quite literal. The corresponding value of do actually is an array of instructions. Each string is split on every | into an array. For the first string in the do array, that looks like this:

// The first instruction 
var processedInstruction1 = "bake|_px2|330||true|300".split("|"); // => ["bake","_px2","330","","true","300"]

The first element of the resulting array determines the function to be executed, while the remaining elements are taken as the arguments for the function. In this case, bake is the name of the function to be executed.

Searching for bake in the deobfuscated PerimeterX script, we discover the cu object. This cu object holds the handler for the bake instruction:

var cu = { 
	/** 
	 * @param n = "_px2" 
	 * @param r = "330" 
	 * @param t = "" 
	 * @param a = "true" 
	 * @param e = "300" 
	 */ 
	bake: function (n, r, t, a, e) { 
		if (ff.J === window._pxAppId) { 
			wt(n, r, t, a); 
		} 
		/* ... */ 
	}, 
	/* ... */ 
};

The arguments n, r, t, a, and e all take on the values of "_px2", "330", "<jwt>", "true", and "300" respectively.

The bake method calls a function, wt. Let's look up the definition of that too:

/** 
 * @param n = "_px2" 
 * @param r = "330" 
 * @param t = "" 
 * @param a = "true" 
 */ 
 
function wt(n, r, t, a) { 
	/* ...*/ 
 
	try { 
		var i; 
		// Creates the expiry date of the cookie, based on the "r" parameter. 
		if (r !== null) { 
			i = new Date(+new Date() + 1000 * r).toUTCString().replace(/GMT$/, "UTC"); 
		} 
		// Initialize the _px2 cookie string 
		var o = n + "=" + t + "; expires=" + i + "; path=/"; 
		var c = (a === true || a === "true") && bt(); 
		// Append the site domain to the cookie, and add the cookie to document.cookie 
		c && (o = o + "; domain=" + c)((document.cookie = o + "; " + e)); 
		return true; 
	} catch (n) { 
		return false; 
	} 
}

So, it looks like the bake instruction directly sets the _px2 cookie! It's also a play on words, as in baking cookies.

Congrats! You found where in the code their main anti-bot cookie is being set! The next step will be to calculate values for it that make sense to PerimeterX, so your bot does not get flagged as suspicious.

You should note that the cu object contains handlers for all other possible do instructions, too! To create a PerimeterX bypass, you need to reverse engineer the functionality of each do instruction.

Let's learn how to break some of the security checks you might find inside this do array.

WebGL Fingerprinting

In the snippet below, PerimeterX uses WebGL APIs to create and render an image. The hash of the image is then stored in canvasfp:

// This function creates, renders, and hashes the image to construct "canvasfp". 
 
function A() { 
	return new T(function (c) { 
		setTimeout(function () { 
			try { 
				a = n.createBuffer(); 
				n.bindBuffer(n.ARRAY_BUFFER, a); 
				var u = new Float32Array([ 
					-0.2, -0.9, 0, 0.4, -0.26, 0, 0, 0.732134444, 0, 
				]); 
				n.bufferData(n.ARRAY_BUFFER, u, n.STATIC_DRAW)((a.itemSize = 3))( 
					(a.numItems = 3) 
				)((e = n.createProgram()))((i = n.createShader(n.VERTEX_SHADER))); 
				n.shaderSource( 
					i, 
					"attribute vec2 attrVertex;varying vec2 varyinTexCoordinate;uniform vec2 uniformOffset;void main(){varyinTexCoordinate=attrVertex+uniformOffset;gl_Position=vec4(attrVertex,0,1);}" 
				); 
				/* Some more transformations on the canvas image... */ 
				/* ... */ 
				n.drawArrays( 
					n.TRIANGLE_STRIP, 
					0, 
					a.numItems 
				)( 
					(r.canvasfp = n.canvas === null ? "no_fp" : In(n.canvas.toDataURL())) // In() computes a hash of the generated image 
				)((r.extensions = n.getSupportedExtensions() || ["no_fp"])); 
			} catch (n) { 
				r.errors.push("PX10703"); 
			} 
			/* ... */ 
		}, 1); 
	}); 
}

This is useful for fingerprinting because even if instructed to draw the exact same image, slight variations in hardware or low-level software (i.e., operating systems) will produce a different output (and thus, a different hash). This makes WebGL fingerprinting a good way to classify devices.

PerimeterX also collects various other WebGL properties to better classify your device. Using machine learning, they can use this data to detect if you're spoofing WebGL properties/rendering.

The computed canvasfp, along with the additional WebGL properties are added to the payload object in the snippet below:

// Adding the collected WebGL data to the POST request payload 
 
(function (t) { 
	(a.PX10061 = t.canvasfp)((a.PX11016 = t.webglVendor))((a.PX10529 = t.errors))( 
		(a.PX10279 = t.webglRenderer) 
	)((a.PX10753 = t.webGLVersion))((a.PX10246 = t.extensions))( 
		(a.PX11232 = In(t.extensions)) 
	)((a.PX10871 = t.webglParameters))((a.PX11231 = In(t.webglParameters)))( 
		(a.PX11077 = t.unmaskedVendor) 
	)((a.PX10165 = t.unmaskedRenderer))((a.PX10244 = t.shadingLangulageVersion)); 
	tt("PX11223"); 
	r(a); 
});

Automated Browser Checks

Below, PerimeterX is checking for the existence of automated-browser-specific properties:

try { 
	(n.PX10010 = !!window.emit)((n.PX10225 = !!window.spawn))( 
		(n.PX10855 = !!window.fmget_targets) 
	)((n.PX11065 = !!window.awesomium))((n.PX10456 = !!window.__nightmare))( 
		(n.PX10441 = Xr(window.RunPerfTest)) 
	)((n.PX10098 = !!window.geb))((n.PX10557 = !!window._Selenium_IDE_Recorder))( 
		(n.PX10170 = !!window._phantom || !!window.callPhantom) 
	)((n.PX10824 = !!document.__webdriver_script_fn))( 
		(n.PX10087 = !!window.domAutomation || !!window.domAutomationController) 
	)( 
		(n.PX11042 = 
			window.hasOwnProperty("webdriver") || 
			!!window["webdriver"] || 
			document.getElementsByTagName("html")[0].getAttribute("webdriver") === 
				"true") 
	); 
} catch (n) {}

Sandboxing Checks

PerimeterX checks for the existence of NodeJS-only APIs to determine if the script is being sandboxed:

var n; 
// The process object only exists in NodeJS. 
try { 
	n = 
		n || 
		((typeof process == "undefined" ? "undefined" : A(process)) === "object" && 
			String(process) === "[object process]"); 
} catch (n) {} 
 
try { 
	n = n || /node|io\.js/.test(process.release.name) === true; 
} catch (n) {}

To make sure built-in functions haven't been modified (i.e. monkey-patched), PerimeterX calls typeof and an implicit toString on them:

// A() acts as a wrapper for "typeof" 
 
function A(n) { 
	A = 
		typeof Symbol == "function" && typeof Symbol.iterator == "symbol" 
			? function (n) { 
					return typeof n; 
				} 
			: function (n) { 
					return n && 
						typeof Symbol == "function" && 
						n.constructor === Symbol && 
						n !== Symbol.prototype 
						? "symbol" 
						: typeof n; 
				}; 
	return A(n); 
} 
// 
function Xr(n) { 
	// When typeof is called on an unmodified built-in function, it will return "function". 
	// "" + n is an implicit toString() 
	// An unmodified built-in function will always include "[native code]" in the result. 
	return A(n) === "function" && /\{\s*\[native code\]\s*\}/.test("" + n); 
} 
 
/* ... */ 
 
// Later used like this: 
 
n.PX10213 = Xr(window.EventSource); 
n.PX10283 = Xr(Function.prototype.bind); 
n.PX10116 = Xr(window.setInterval); 
 
// If they haven't been modified, all the above calls should return true.

User Input Event Tracking

PerimeterX collects behavioral biometrics, such as mouse movements, keyboard presses, and touch movements. The collected data can then be analyzed with machine learning to determine if the inputs are human-like, or generated by a bot.

In this snippet, PerimeterX tracks the timing and position of touch events:

{ 
	(function (n, r) { 
		_i.length < 10 && 
			_i.push( 
				+n.movementX.toFixed(2) + "," + +n.movementY.toFixed(2) + "," + wr(r) 
			); 
 
		if (n && n.movementX && n.movementY) { 
			if (Pi.length < 50) { 
				Pi.push( 
					(function (n) { 
						var r = n.touches || n.changedTouches; 
						var t = r && r[0]; 
						var a = +(t ? t.clientX : n.clientX).toFixed(0); 
						var e = +(t ? t.clientY : n.clientY).toFixed(0); 
 
						var i = (function (n) { 
							return +(n.timestamp || n.timeStamp || 0).toFixed(0); 
						})(n); 
 
						return "".concat(a, ",").concat(e, ",").concat(i); 
					})(n) 
				); 
			} 
		} 
	})(n, t); 
}

Conclusion

As you can see, different methods exist to bypass the PerimeterX bot detection system, some being more reliable than others. While reverse engineering the PerimeterX JavaScript challenge is one of the paths, it can get tedious depending on the level of obfuscation.

However, with APIs and smart proxies like ZenRows, you can easily scrape any website data with a few lines of code. You can try it by signing up to get your free API key right away.

Keep in mind that PerimeterX frequently updates its challenges, so if you opt for deobfuscation, you must constantly check your scraper to avoid detection.

Hopefully, this tutorial helped you with your web scraping projects. If you want to learn more techniques for bypassing bot defenses, you can check our detailed guides on Cloudflare bypass and Akamai bypass.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.