Protection against malicious attacks and anti-bot detection techniques are amongst all modern websites' ABC tasks.
TLS fingerprint analysis is one of those solutions, with whose help servers can identify which web client is trying to initiate a conversation and later decide whether to allow the request. While their target isn't ethical data extraction tools, your scraper might still get caught in the crossfire and get blocked.
In this article, we'll guide you through how to bypass TLS fingerprinting for ethical web scraping purposes.
Let's dive in!
What Is TLS Fingerprinting?
This is a popular server-side fingerprinting technique. It enables web servers to determine a web client's identity to a high degree of accuracy, using only the parameters in the first packet connection before any application data exchange occurs. The clients can be browsers, CLI tools, scripts (bots), and practically all apps initiating a request. Solutions like Cloudflare use TLS fingerprint to identify and blog malicious attacks.
A different type of fingerprinting is client-side fingerprinting, which involves testing the client using JavaScript. However, this is a discussion for another article.
Let's get back to it:
TLS (Transport Layer Security) is an encryption protocol to secure connections between web clients and servers. While it's often used interchangeably with SSL, it evolved from SSL and is now the most widely used web communication security protocol.
When a web scraper sends a request to an HTTPS website, it does so over TLS security. While that wouldn't particularly mean anything to scrapers, websites with TLS fingerprinting in their network not only identify you as a malicious bot but deny you access completely.
How Does TLS Fingerprint Work?
For a client to communicate with a server over a secure channel, both parties must agree on that conversation's encryption algorithm and cryptographic keys. This agreement is reached through a TLS handshake: that's the entire sequence where they exchange essential information required to establish a secure connection.
Typically clients' first approach here would be the hello message, in which they declare the set of TLS parameters it supports. Some of these include:
- The max TLS version it supports (TLS 1.0–TLS 1.3).
- A list of cipher suites, that is, the cryptographic algorithm to be used for encryption.
- A list of supported extensions.
Each client uses a different TLS library. For Firefox users, that is NSS, for Chrome, it'is BoringSSL, Python uses OpenSSL, and Safari's is Secure Transport. Therefore these parameters' value differs significantly per client.
Since this message isn't encrypted, we can view it using NSM tools like Wireshark. Below is a TLS client hello message sent by Chrome to Wikipedia.
The trees, their content and their order, particularly that of the cipher suites, differ depending on the web client. Here's what those look like for Chrome:
The TLS protocol is complex with lots of information, as we can see from the number of extensions in the example above. Each one contains its own set of parameters. For instance, some clients support the fake TLS extension, GREASE.
Using all this information, TLS fingerprinting enables the calculation of the TLS signature, also known as the client's "fingerprint." The server then uses this signature to infer the client before sending any data. And this is where the blocking occurs.
So, how is this signature created? If we understand what goes on behind the scenes, we execute a bypass with larger success.
TLS Signature Calculation
TLS fingerprinting is based on parameters in the unencrypted client hello message. By taking each parameter's ID in order and hashing the resulting string, we can get a unique fingerprint. The de-facto standard algorithm for generating this is known as JA3.
JA3 works by concatenating the decimal values of the bytes of five fields in the message and then hashing them. These fields are:
TLS version
Cipher suites
Cipher suites
Extensions
Elliptic curves
Elliptic curve point formats
Each one is separated by commas (","), while dashes divide the elements in the fields ("-"). This string is then hashed with MD5 to generate its JA3 fingerprint.
Since JA3 is integrated into Wireshark, we can see our full string and fingerprint from our previous example:
771,4867-4865-4866-52393-52392-49195-49199-49196-49200-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513-21-41,29-23-24,0
Here's the MD5 Fingerprint:
fedca33016b974c390faa610378b5a62
In a nutshell, every client has a unique fingerprint. Detecting which one is making an HTTPS request can be as simple as matching signatures to a database. Here are some examples of clients' signatures:
Firefox 94: 2312b27cb2c9ea5cabb52845cafff423
Firefox 87: bc6c386f480ee97b9d9e52d472b772d8
Chrome 97: b32309a26951912be7dba376398abc3b
Chrome 70: 5353c0796e25725adfdb93f35f5a18f7
How to Bypass TLS Fingerprinting
Let's try it for ourselves!
We'll use Node.js to build a scraper and hopefully win over the anti-bot technique.
To follow this tutorial, you'll need Node and npm installed (note that some systems have it pre-installed). To do so, you need to run npm install
:
npm init -y
npm install axios
node -p crypto.constants.defaultCoreCipherList | tr ':' '\n'
TLS_AES_256_GCM_SHA384
TLS_CHACHA20_POLY1305_SHA256
TLS_AES_128_GCM_SHA256
ECDHE-RSA-AES128-GCM-SHA256
ECDHE-ECDSA-AES128-GCM-SHA256
ECDHE-RSA-AES256-GCM-SHA384
ECDHE-ECDSA-AES256-GCM-SHA384
DHE-RSA-AES128-GCM-SHA256
ECDHE-RSA-AES128-SHA256
DHE-RSA-AES128-SHA256
ECDHE-RSA-AES256-SHA384
DHE-RSA-AES256-SHA384
ECDHE-RSA-AES256-SHA256
DHE-RSA-AES256-SHA256
HIGH
!aNULL
!eNULL
!EXPORT
!DES
!RC4
!MD5
!PSK
!SRP
!CAMELLIA
If you change the order of the above list in any way, you'll get a new fingerprint. But how should you go about these changes?
If you're familiar with ciphers, you'll notice that the first three are all highly recommended TLS v 1.3 ciphers. This means that all modern clients have them as their first option, though in different orders.
Let's go back to our Wireshark capture. We can see that Chrome uses these same ciphers as the first options, but in this exact order:
Remember that it's safe to leave the first three as they are but shuffle the remaining ones. Fantastic! You can bypass the TLS fingerprint check using Node.js with this configuration.
const crypto = require('crypto');
const request = require('request');
const https = require('https');
const nodeOrderedCipherList = crypto.constants.defaultCipherList.split(':');
// keep the most important ciphers in the same order
const fixedCipherList = nodeOrderedCipherList.slice(0, 3);
// shuffle the rest
const shuffledCipherList = nodeOrderedCipherList.slice(3)
.map(cipher => ({ cipher, sort: Math.random() }))
.sort((a, b) => a.sort - b.sort)
.map(({ cipher }) => cipher);
You can even go further by reordering the first three ciphers, after all. But you should be careful! Some cipher list rearranging can compromise your request security. Therefore if you're working on a security-sensitive project, make sure you do your research.
Also, bypassing complex solutions like Akamai fingerprinting is a much more challenging endeavor. The goal here is to ensure your fingerprint is not too rare that it gets blocklisted.
Other Ways to Bypass TLS Fingerprinting
Of course, you have other methods available for running a TLS fingerprint bypass. Let's explore the most popular ones:
Headless Browsers
When scraping, you run the browser in a headless mode, you get its fingerprint. This means that the web server sees you as a browser client.
Python
You can bypass TLS fingerprint detection in Python by spoofing the cipher suite and TLS version using HTTP adapter and requests.
Java
Reconfigure the list of enabled cipher suites using the ssl-config.enabledCipherSuites
method, and you're good to go. Find out more in its documentation.
Go
Go is a programming language that supports JA3 signature faking. It does so by spoofing the five fields of the client hello message used by the JA3 algorithm to identify TLS signatures. This is possible using Golang libraries like Refraction Networking's utls or ja3transport.
Conclusion
All scrapers should know how to scrape a page without getting blocked. Today, you learned how to bypass TLS fingerprinting, or one of the most commonly employed anti-bot detection techniques.
While escaping it is no mean fit, you can succeed by changing your fingerprint to one that isn't blocklisted.
Remember that even after applying these tips, they might not work for your project, especially if it's a large-scale one, and you could still end up blocked. Don't waste precious time and resources! ZenRows' scraping API can handle thousands of requests per second while bypassing TLS fingerprinting, anti-bots, CAPTCHA, and other protection techniques. Try it for free today!