Understanding the process of browser fingerprinting can help you build a more efficient web scraper. CreepJS, a testing website, provides a real-life simulation of what happens during browser fingerprinting that allows you to test your scrapers and fix potential leaks.
In this article, you'll learn how CreepJS works and how to leverage its browser fingerprinting analysis to test various browser automation tools.
- How to test browser automation tools on CreepJS.
- Improve CreepJS test results with fortified headless browsers.
- The best tool to bypass browser fingerprinting.
Let's go!
What Is CreepJS?
CreepJS is a browser fingerprinting test website that analyzes anti-fingerprinting extensions and browsers for possible loopholes. It exposes key anti-leak weaknesses in these tools and shows how anti-bot measures can detect them.
While CreepJS is not a direct solution for bot detection, it's a valuable diagnostic tool for web scrapers. The tool helps identify the loopholes to patch, reducing the risk of being blocked. Conversely, it can help anti-bot developers identify bot-like patterns, enabling them to strengthen their detection mechanisms.
Here are the key features of CreepJS:
- Lie Prototyping: CreepJS detects inconsistencies in browser object prototypes altered to impersonate browser fingerprints, such as missing or incorrectly implemented methods.
- Fingerprint Lie Patterns: The website assesses inconsistencies in spoofed browser fingerprints, revealing common "lie patterns". These include mismatched values between related APIs (e.g., WebGL vendor vs. renderer), conflicting hardware metrics (e.g., device memory vs. hardwareConcurrency), etc. Such discrepancies can expose deceptive behavior.
- Fingerprint Extension Code: CreepJS detects the unique code modifications introduced by browser extensions, which may unintentionally reveal their presence.
- Fingerprint Browser Privacy Settings: The website reveals differences in privacy settings, such as blocking cookies or changing headers that unintentionally leave a distinct trace.
- Large-Scale Data Validation: It validates fingerprint data across multiple metrics, such as screen resolution, color depth, supported MIME types, media codecs, and more, to detect contradictions and identify spoofed values.
- Detection of New APIs With High Entropy: CreepJS targets high-entropy APIs whose rich data is more complex to spoof, such as hardware concurrency or WebRTC.
- Use of APIs that Are Difficult to Fake: It relies on robust APIs, like device properties or rendering engines, that provide reliable data and are difficult to manipulate without detection.
Now, let's understand the browser fingerprinting steps used by CreepJS.
How CreepJS Tests Browsers
CreepJS evaluates privacy browsers and evasion plugins, particularly those designed to mimic or obfuscate browser fingerprints. For effective reporting, CreepJS simulates typical browser fingerprinting workflows used by anti-bots like Cloudflare:
- Data Collection: CreepJS collects various browser attributes, such as installed plugins, User Agent strings, screen dimensions, canvas rendering, runtime engine, network IP address, WebGL fingerprints, and more.
- Hashing: It then converts the collected attributes into unique hashes using cryptographic or other hashing algorithms to standardize and anonymize data.
- Trust Score Generation: The tool uses fingerprint tracing formulas to calculate trust scores based on the consistency and legitimacy of the hashed attributes. Lower scores may indicate spoofing or non-standard setups.
- Fingerprint Comparison (Crowd Blending Score): CreepJS compares the generated fingerprint against a database of known fingerprints to assess its uniqueness and how well it blends with typical browser fingerprints. Anomalies or similarities to known spoofed fingerprints may indicate attempts to evade detection.
- Bot Detection: CreepJS identifies suspicious behavior by detecting anomalies such as excessive fingerprinting, worker scope tampering, and mismatched request headers. These anomalies indicate potential bot-like activity or automation tools attempting to bypass detection mechanisms.
- Browser Prediction: The website attempts to predict the browser environment behind a fingerprint. To do so, it highlights discrepancies from claimed attributes or expected behavior.
- Reporting: CreepJS concludes its analysis with a detailed report of fingerprint properties within its user interface (UI), including detected anomalies, potential leak points, and computed trust scores.
What Browser Fingerprints Does CreepJS Analyze?
CreepJS analyzes various browser fingerprint data. We'll discuss a few significant ones in this section.
Navigator Properties
CreepJS collects various properties from the browser's window.navigator
object to analyze and distinguish browser behaviors.Â
- Device Memory: Indicates the amount of device memory available.
- Platform: Reveals the operating system, such as "Win32" for Windows or "MacIntel" for macOS.
- MimeTypes: Lists supported media types and plugins.
- Hardware Concurrency: Indicates the number of logical processors available for profiling the device's performance.
- User Agent: Details the browser type, version, and operating system.
- Vendor: Specifies the browser's vendor, such as "Google Inc." for Chrome.
- Automated WebDriver Presence: Detects if a WebDriver is active, which can reveal the use of browser automation tools like Selenium or Playwright.
Although there are more navigator fields than what's on the list, these are the most significant ones.
WebGL
CreepJS assesses a browser's 2D and 3D graphic rendering capabilities using the Canvas and WebGL APIs to extract unique attributes. Examples of collected fingerprints include the graphics processing unit (GPU) model, version, and vendor.
These WebGL fingerprints can reveal subtle differences in rendering caused by factors like hardware variations, drivers, and browser settings. While headless browsers have rendering capabilities, their output may differ slightly due to software-based rendering or lack of hardware acceleration, which indicates the use of automation tools.
Canvas 2D
CreepJS leverages the HTML5 Canvas 2D API to analyze how a browser renders shapes, pixels, and texts. The Canvas API allows JavaScript to draw shapes, images, and text on an HTML canvas.
The canvas rendering output can vary slightly depending on factors like browser version, GPU, installed fonts, pixel rendering methods, etc.
Audio, Speech, and Media
CreepJS collects information about the browser's supported audio format, frequency, speech synthesis capability, available voices, speech languages, media codecs (video and audio), and more.
The lack of specific audio capabilities (e.g., no available voices for speech synthesis or limited codec support) or variations in audio context can reveal the use of automation tools.
Screen
Since screen resolution varies among devices, CreepJS leverages these differences to fingerprint a browser's screen information. Automation tools like Selenium and Playwright often simulate screen properties differently than actual browsers, typically due to their default settings or incomplete replication of physical devices.
By detecting mismatches in screen resolution, viewport size, and device scaling factors, CreepJS can identify automation tools or distinguish between browser types.
Now that you understand how CreepJS works and the types of data it collects, let's test actual scraping tools with it.
How to Test Browser Automation Tools on CreepJS?
To demonstrate how the CreepJS test works, let's test Selenium and Playwright, the two most popular headless browsers, using the CreepJS itself as the target site.
In each case, we'll grab a full-page screenshot, but we'll specifically extract the Headless score section to simplify the result.
Note that a regular browser (Chrome) would show a 0% headless and a 0% stealth score, as shown below. Any deviation from this result may signal automation:
Now, we'll show the test code in Python since it's the most popular web scraping language.
Test Selenium Fingerprinting on CreepJS
Selenium is the most popular browser automation tool, allowing you to control the browser via an automated WebDriver.
To begin the Selenium test on CreepJS, ensure you install the library:
pip3 install selenium
Launch a Chrome instance in headless mode, open the CreepJS website, and take a full-page screenshot of the page:
# pip3 install selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from time import sleep
# run Chrome in headless mode
options = Options()
options.add_argument("--headless=new")
# start a driver instance
driver = webdriver.Chrome(options=options)
# open the target website
driver.get("https://abrahamjuliot.github.io/creepjs/")
# wait for the page to load
sleep(10)
# define a function to get scroll dimensions
def get_scroll_dimension(axis):
return driver.execute_script(f"return document.body.parentNode.scroll{axis}")
# get the page scroll dimensions
width = get_scroll_dimension("Width")
height = get_scroll_dimension("Height")
# set the browser window size
driver.set_window_size(width, height)
# get the full-body element
full_body_element = driver.find_element(By.TAG_NAME, "body")
# take a full-page screenshot
full_body_element.screenshot("creepjs-selenium.png")
# quit the browser
driver.quit()
The above returns a full-page screenshot of CreepJS, showing the test results for the browser fingerprinting test.Â
Notably, the Headless score section shows that Selenium is 100% headless with 0% stealth capability. This result (100% headless) is high compared to the 0% observed when we ran a regular browser through CreepJS. Here's the extracted screenshot for that section:
With this result, an anti-bot can deduce that Selenium is an automated WebDriver and will likely block it. While the stealth score shows 0%, the headless flag gives it off as being bot-like.
Playwright Fingerprinting on CreepJS
Playwright is also a browser automation library, but it uses the Chrome DevTools Protocol to control the browser. Let's see if CreepJS returns a different result for Playwright.
Ensure you install the library and its browser binaries if you've not done so:
pip3 install playwright
playwright install
Visit the target page with a headless Chrome instance and take its full-page screenshot:
# pip3 install playwright
# playwright install
from playwright.sync_api import sync_playwright
from time import sleep
# initialize Playwright in synchronous mode
with sync_playwright() as p:
# launch the browser
browser = p.chromium.launch()
# create a new page instance
page = browser.new_page()
# navigate to the target web page
page.goto("https://abrahamjuliot.github.io/creepjs/")
# wait for the page to load
sleep(10)
# take a full-page screenshot
page.screenshot(path="creep-js-playwright.png", full_page=True)
# close the browser
browser.close()
We got the same result, proving that Playwright also has a high likelihood of being flagged as a bot:
Again, the headless score is above a regular browser's score. CreepJS has revealed that these regular headless browser tools don't handle fingerprinting properly. What if we use a fortified one? Let's find out.
Improve CreepJS Test Result With Fortified Headless Browsers
Fortified headless browsers feature patches that hide common bot-like attributes, improving performance on the CreepJS browser fingerprinting test. Let's improve the previous Selenium test using SeleniumBase with Undetected ChromeDriver (UC).
SeleniumBase is an automation tool you can pair with the Undetected ChromeDriver to avoid anti-bot detection. Its advantage over the regular Selenium library is its ability to patch inconsistent browser fingerprints, such as mimicking the GUI browser runtime in headless mode.
First, install the library using pip
:
pip3 install seleniumbase
Now, spin up a Chrome instance in headless mode, launch the target website, and take a full-page screenshot:
# pip3 install seleniumbase
from seleniumbase import Driver
# initialize driver in GUI mode with UC enabled
driver = Driver(uc=True, headless=True)
# set the target URL
url = "https://abrahamjuliot.github.io/creepjs/"
# open the URL using UC mode with a 6-second reconnection time to bypass the initial detection
driver.uc_open_with_reconnect(url, reconnect_time=6)
# wait for the page to load
driver.sleep(10)
element = driver.wait_for_element_visible("body")
height = element.size["height"]
driver.set_window_size(1920, height)
# take a screenshot of the current page and save it
driver.save_screenshot("seleniumbase-creep.png")
# close the browser and end the session
driver.quit()
SeleniumBase returns 0% for the headless and stealth scores, proving it is more likely to evade anti-bots than vanilla Selenium. While this report shows a higher likelihood of being headless (31% like headless), the low headless and stealth scores prove that SeleniumBase with UC patches the browser undetected.
Awesome! You've increased your scraper's trust score using SeleniumBase.Â
However, SeleniumBase has its limitations. As an open-source tool, anti-bot measures will eventually adapt to its evasion techniques. Moreover, It's memory-inefficient due to browser instance memory overhead, making it unsuitable for large-scale web scraping.
Other headless browser tools also have stealth plugins to improve anti-bot evasion. For instance, Playwright Stealth is for Playwright, while Puppeteer has the Puppeteer Stealth plugin.Â
However, these tools are often insufficient for bypassing advanced anti-bot systems. Although you can further improve a tool like Puppeteer Stealth with custom evasions, the process is generally technical, time-consuming, and unscalable.
Fortunately, there's an easy way to solve all these limitations and scrape without getting blocked. You'll find out below.
Best Tool to Bypass Browser Fingerprinting
Although anti-bots use other detection mechanisms to block bots, browser fingerprinting is among the most challenging. Even if the previously mentioned tools evade browser fingerprinting, they won't escape the other detection techniques.
The easiest way to scrape at scale without limitations is to use a web scraping API like the ZenRows Scraper API. ZenRows features all the tools required for successful scraping, such as advanced fingerprint evasion, premium proxy rotation, request header management, anti-bot auto-bypass, JavaScript rendering, and more. It also has headless browser features, allowing you to interact with a web page as a human would.
You only need to send a single API call in any programming language. Let's see how ZenRows works by scraping this Anti-bot Challenge page.Â
Sign up to load the ZenRows Request Builder. Then, paste the target URL in the link box and activate Premium Proxies and JS Rendering.
Select your programming language (Python, in this case) and choose the API connection mode. Copy and paste the generated code into your Python script.
Here's the generated Python code:
# pip3 install requests
import requests
url = 'https://www.scrapingcourse.com/antibot-challenge'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
'url': url,
'apikey': apikey,
'js_render': 'true',
'premium_proxy': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)
The above scraper accesses the protected website and scrapes its full-page HTML, as shown:
<html lang="en">
<head>
<!-- ... -->
<title>Antibot Challenge - ScrapingCourse.com</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
<h2>
You bypassed the Antibot challenge! :D
</h2>
<!-- other content omitted for brevity -->
</body>
</html>
Congratulations! 🎉. You just found a lasting solution to all the limitations of anti-bot detection. Your scraper now bypasses anti-bots using ZenRows.
Conclusion
You've learned how CreepJS works, including its browser fingerprinting steps and the data it collects. The tool demonstrates how real-life anti-bots detect scrapers via browser fingerprinting and how you can leverage its features to improve your web scraper.
However, manually patching open-source web scraping tools isn't sustainable and doesn't guarantee success. We recommend using ZenRows, a lightweight, all-in-one web scraping solution, to bypass blocks at scale. Â
Try ZenRows for free today without a credit card commitment!