The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ๐Ÿ˜Ž

What Is TLS Fingerprint and How to Bypass It

November 23, 2022 ยท 6 min read

Protection against malicious attacks and anti-bot detection techniques are amongst all modern websites' ABC tasks.

TLS fingerprint analysis is one of those solutions, with whose help servers can identify which web client is trying to initiate a conversation and later decide whether to allow the request. While their target isn't ethical data extraction tools, your scraper might still get caught in the crossfire and get blocked.

In this article, we'll guide you through how to bypass TLS fingerprinting for ethical web scraping purposes.

Let's dive in!

What Is TLS Fingerprinting?

This is a popular server-side fingerprinting technique. It enables web servers to determine a web client's identity to a high degree of accuracy, using only the parameters in the first packet connection before any application data exchange occurs. The clients can be browsers, CLI tools, scripts (bots), and practically all apps initiating a request. Solutions like Cloudflare use TLS fingerprint to identify and blog malicious attacks.

A different type of fingerprinting is client-side fingerprinting, which involves testing the client using JavaScript. However, this is a discussion for another article.

Let's get back to it:

TLS (Transport Layer Security) is an encryption protocol to secure connections between web clients and servers. While it's often used interchangeably with SSL, it evolved from SSL and is now the most widely used web communication security protocol.

When a web scraper sends a request to an HTTPS website, it does so over TLS security. While that wouldn't particularly mean anything to scrapers, websites with TLS fingerprinting in their network not only identify you as a malicious bot but deny you access completely.

How Does TLS Fingerprint Work?

For a client to communicate with a server over a secure channel, both parties must agree on that conversation's encryption algorithm and cryptographic keys. This agreement is reached through a TLS handshake: that's the entire sequence where they exchange essential information required to establish a secure connection.

Typically clients' first approach here would be the hello message, in which they declare the set of TLS parameters it supports. Some of these include:

  • The max TLS version it supports (TLS 1.0โ€“TLS 1.3).
  • A list of cipher suites, that is, the cryptographic algorithm to be used for encryption.
  • A list of supported extensions.

Each client uses a different TLS library. For Firefox users, that is NSS, for Chrome, it'is BoringSSL, Python uses OpenSSL, and Safari's is Secure Transport. Therefore these parameters' value differs significantly per client.

Since this message isn't encrypted, we can view it using NSM tools like Wireshark. Below is a TLS client hello message sent by Chrome to Wikipedia.

tls-client-hello-bypass
Click to open the image in full screen

The trees, their content and their order, particularly that of the cipher suites, differ depending on the web client. Here's what those look like for Chrome:

bypass-tls
Click to open the image in full screen

The TLS protocol is complex with lots of information, as we can see from the number of extensions in the example above. Each one contains its own set of parameters. For instance, some clients support the fake TLS extension, GREASE.

Using all this information, TLS fingerprinting enables the calculation of the TLS signature, also known as the client's "fingerprint." The server then uses this signature to infer the client before sending any data. And this is where the blocking occurs.

So, how is this signature created? If we understand what goes on behind the scenes, we execute a bypass with larger success.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

TLS Signature Calculation

TLS fingerprinting is based on parameters in the unencrypted client hello message. By taking each parameter's ID in order and hashing the resulting string, we can get a unique fingerprint. The de-facto standard algorithm for generating this is known as JA3.

JA3 works by concatenating the decimal values of the bytes of five fields in the message and then hashing them. These fields are:

Example
TLS version 
Cipher suites 
Cipher suites 
Extensions 
Elliptic curves 
Elliptic curve point formats

Each one is separated by commas (","), while dashes divide the elements in the fields ("-"). This string is then hashed with MD5 to generate its JA3 fingerprint.

Since JA3 is integrated into Wireshark, we can see our full string and fingerprint from our previous example:

Example
771,4867-4865-4866-52393-52392-49195-49199-49196-49200-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513-21-41,29-23-24,0

Here's the MD5 Fingerprint:

Example
fedca33016b974c390faa610378b5a62

In a nutshell, every client has a unique fingerprint. Detecting which one is making an HTTPS request can be as simple as matching signatures to a database. Here are some examples of clients' signatures:

Example
Firefox 94: 2312b27cb2c9ea5cabb52845cafff423 
Firefox 87: bc6c386f480ee97b9d9e52d472b772d8 
Chrome 97: b32309a26951912be7dba376398abc3b 
Chrome 70: 5353c0796e25725adfdb93f35f5a18f7

How to Bypass TLS Fingerprinting

Let's try it for ourselves!

We'll use Node.js to build a scraper and hopefully win over the anti-bot technique.

To follow this tutorial, you'll need Node and npm installed (note that some systems have it pre-installed). To do so, you need to run npm install:

Terminal
npm init -y 
npm install axios
Output
node -p crypto.constants.defaultCoreCipherList | tr ':' '\n' 
TLS_AES_256_GCM_SHA384 
TLS_CHACHA20_POLY1305_SHA256 
TLS_AES_128_GCM_SHA256 
ECDHE-RSA-AES128-GCM-SHA256 
ECDHE-ECDSA-AES128-GCM-SHA256 
ECDHE-RSA-AES256-GCM-SHA384 
ECDHE-ECDSA-AES256-GCM-SHA384 
DHE-RSA-AES128-GCM-SHA256 
ECDHE-RSA-AES128-SHA256 
DHE-RSA-AES128-SHA256 
ECDHE-RSA-AES256-SHA384 
DHE-RSA-AES256-SHA384 
ECDHE-RSA-AES256-SHA256 
DHE-RSA-AES256-SHA256 
HIGH 
!aNULL 
!eNULL 
!EXPORT 
!DES 
!RC4 
!MD5 
!PSK 
!SRP 
!CAMELLIA

If you change the order of the above list in any way, you'll get a new fingerprint. But how should you go about these changes?

If you're familiar with ciphers, you'll notice that the first three are all highly recommended TLS v 1.3 ciphers. This means that all modern clients have them as their first option, though in different orders.

Let's go back to our Wireshark capture. We can see that Chrome uses these same ciphers as the first options, but in this exact order:

bypass-tls-chrome
Click to open the image in full screen

Remember that it's safe to leave the first three as they are but shuffle the remaining ones. Fantastic! You can bypass the TLS fingerprint check using Node.js with this configuration.

program.js
const crypto = require('crypto'); 
 
const request = require('request'); 
const https = require('https'); 
 
const nodeOrderedCipherList = crypto.constants.defaultCipherList.split(':'); 
 
	// keep the most important ciphers in the same order 
	const fixedCipherList = nodeOrderedCipherList.slice(0, 3); 
 
	// shuffle the rest 
	const shuffledCipherList = nodeOrderedCipherList.slice(3) 
		.map(cipher => ({ cipher, sort: Math.random() })) 
		.sort((a, b) => a.sort - b.sort) 
		.map(({ cipher }) => cipher);

You can even go further by reordering the first three ciphers, after all. But you should be careful! Some cipher list rearranging can compromise your request security. Therefore if you're working on a security-sensitive project, make sure you do your research.

Also, bypassing complex solutions like Akamai fingerprinting is a much more challenging endeavor. The goal here is to ensure your fingerprint is not too rare that it gets blocklisted.

Other Ways to Bypass TLS Fingerprinting

Of course, you have other methods available for running a TLS fingerprint bypass. Let's explore the most popular ones:

Headless Browsers

When scraping, you run the browser in a headless mode, you get its fingerprint. This means that the web server sees you as a browser client.

Python

You can bypass TLS fingerprint detection in Python by spoofing the cipher suite and TLS version using HTTP adapter and requests.

Java

Reconfigure the list of enabled cipher suites using the ssl-config.enabledCipherSuites method, and you're good to go. Find out more in its documentation.

Go

Go is a programming language that supports JA3 signature faking. It does so by spoofing the five fields of the client hello message used by the JA3 algorithm to identify TLS signatures. This is possible using Golang libraries like Refraction Networking's utls or ja3transport.

Conclusion

All scrapers should know how to scrape a page without getting blocked. Today, you learned how to bypass TLS fingerprinting, or one of the most commonly employed anti-bot detection techniques.

While escaping it is no mean fit, you can succeed by changing your fingerprint to one that isn't blocklisted.

Remember that even after applying these tips, they might not work for your project, especially if it's a large-scale one, and you could still end up blocked. Don't waste precious time and resources! ZenRows' scraping API can handle thousands of requests per second while bypassing TLS fingerprinting, anti-bots, CAPTCHA, and other protection techniques. Try it for free today!

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.