How to Avoid Bot Detection with Selenium

January 5, 2023 ยท 9 min read

Is it possible to avoid bot detection with Selenium? Yes, it is. We know how annoying antibot protections can get, but there are solutions you can use to get around them. We'll cover the 9 most important ones to bypass Selenium detection:

How do Antibots work?

Bot detection is the process of scanning and filtering the network traffic to detect and block malicious bots. Anti-bot providers like Cloudflare, PerimeterX and Akamai work tirelessly to find ways to detect bots using a headless browser and header data as well as different behavioral patterns.

When a client requests a web page, information about the nature of the request and the requester is sent to the server to be processed. Active and passive detection are the 2 main methods an antibot uses to detect bot activities; check out our article on bot detection to learn more.

How do websites detect Selenium?

Selenium is among the popular tools in the field of web scraping. As a result, websites with strict anti-bot policies try to identify its unique attributes before blocking access to their resources.

Selenium bot detection mainly works by testing for specified JavaScript variables that emerge while executing Selenium. Bot detectors often check for the words "Selenium" or "webdriver" in any of the variables (on the window object), as well as document variables named $cdc_ and $wdc_.

They also check for the values of automation indicator flags in the WebDriver, like useAutomationExtension and navigator.webdriver. These attributes are enabled by default to allow a better testing experience and as a security feature.

Top measures to avoid bot detection with Selenium

Bot detection nowadays has become a headache for web scrapers, but there are ways to get around it. Here are some of the techniques you can use to avoid bot detection using Python with Selenium:

1. IP Rotation / Proxy

One of the major ways that most bot detectors work is by inspecting IP behaviors. Web servers can draw a pattern from an IP address by maintaining a log for every request. They use Web Application Firewalls (WAFs) to track and block IP address activities and blacklist suspicious IPs. The repetitive and programmatic request to the server might hurt the IP reputation and can result in getting blocked permanently.

To avoid Selenium bot detection, you can make use of IP rotation or proxies. Proxies act as an intermediary between the requester and the server. The responding server interprets the request as coming from the proxy server and not the client's computer. As a result, it won't be able to draw a pattern for behavioral analysis.

# Install selenium with pip install selenium 
from selenium import webdriver 
 
# Import chromedriver_autoinstaller to install chromedriver automatically 
import chromedriver_autoinstaller 
 
# Installs chromedriver which corresponds to the main Chrome automatically 
chromedriver_autoinstaller.install() 
 
# Define the proxy server 
PROXY = "IpOfTheProxy:PORT" 
 
# Set ChromeOptions() 
chrome_options = webdriver.ChromeOptions() 
 
# Add the proxy as argument 
chrome_options.add_argument("--proxy-server=%s" % PROXY) 
chrome = webdriver.Chrome(options=chrome_options) 
 
# Send the request 
chrome.get("https://www.google.com")

IP rotation is a method that uses a number of proxy servers to disguise the requesting server.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

2. Disabling the automation indicator WebDriver Flags

While web scraping with Selenium, the webdriver sends information to the server to indicate the request is automated. The webdriver is expected to have properties like window.navigator.webdriver that are mandated by the W3C WebDriver Specification to allow better testability and as a security feature. This results in getting detected by the web servers, which leads to being flagged or denied access.

With the availability of execute_cdp_cmd(cmd, cmd_args) commands, you can now easily execute google-chrome-devtools commands using Selenium. This makes it possible to change the default flagships.

from selenium import webdriver 
import chromedriver_autoinstaller 
 
chromedriver_autoinstaller.install() 
 
# Create Chromeoptions instance 
options = webdriver.ChromeOptions() 
 
# Adding argument to disable the AutomationControlled flag 
options.add_argument("--disable-blink-features=AutomationControlled") 
 
# Exclude the collection of enable-automation switches 
options.add_experimental_option("excludeSwitches", ["enable-automation"]) 
 
# Turn-off userAutomationExtension 
options.add_experimental_option("useAutomationExtension", False) 
 
# Setting the driver path and requesting a page 
driver = webdriver.Chrome(options=options) 
 
# Changing the property of the navigator value for webdriver to undefined 
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})") 
 
driver.get("https://www.website.com")

3. Rotating HTTP Header Information and User Agent

The HTTP header contains information about the browser, the operating system, the request type, the user language, the referrer, the device type and so on.

{ 
	"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
	"Accept-Encoding": "gzip, deflate", 
	"Accept-Language": "en-US,en;q=0.9", 
	"Host": "httpbin.org", 
	"Upgrade-Insecure-Requests": "1", 
	"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36", 
}

The values of some of these attributes are different for headless browsers by default, and antibots identify these discrepancies to distinguish between legitimate visitors and bots. To mitigate detection, rotating user agents can be helpful.

Here's how we used rotating HTTP header information to avoid bot detection with Selenium:

from selenium import webdriver 
import chromedriver_autoinstaller 
 
chromedriver_autoinstaller.install() 
driver = webdriver.Chrome() 
 
# Initializing a list with two Useragents 
useragentarray = [ 
	"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36", 
	"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36", 
] 
for i in range(len(useragentarray)): 
	# Setting user agent iteratively as Chrome 108 and 107 
	driver.execute_cdp_cmd("Network.setUserAgentOverride", {"userAgent": useragentarray[i]}) 
	print(driver.execute_script("return navigator.userAgent;")) 
	driver.get("https://www.httpbin.org/headers") 
driver.close()

4. Avoid Patterns

One of the major mistakes that automation testers make is to create a bot with a defined time frame. Humans don't have a solid consistency like a bot, and as a result, it becomes fairly easy for the antibots to identify the consistent patterns of the bots. It's also a common mistake to navigate from page to page rapidly, which humans don't do.

Randomizing time frames, using waits, scrolling slower, and generally trying to mimic human behavior surely elevate the chance to avoid bot detectors. Here's how to bypass Selenium detection by avoiding patterns:

from selenium import webdriver 
from selenium.webdriver.common.by import By 
import chromedriver_autoinstaller 
from selenium.webdriver.support.wait import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
import time 
 
chromedriver_autoinstaller.install() 
driver = webdriver.Chrome() 
driver.get('https://scrapeme.live/shop/') 
 
# Wait 3.5 on the webpage before trying anything 
time.sleep(3.5) 
 
# Wait for 3 seconds until finding the element 
wait = WebDriverWait(driver, 3) 
element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '.woocommerce-loop-product__title'))) 
print('Product name: ' + element.text) 
 
# Wait 4.5 seconds before scrolling down 700px 
time.sleep(4.5) 
driver.execute_script('window.scrollTo(0, 700)') 
 
# Wait 2 seconds before clicking a link 
time.sleep(2) 
wait = WebDriverWait(driver, 3) 
element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'a.woocommerce-loop-product__link'))).click() 
 
# wait for 5 seconds until finding the element 
wait = WebDriverWait(driver, 5) 
element = wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'woocommerce-product-details__short-description'))) 
print('Description: ' + element.text) 
 
# Close the driver after 3 seconds 
time.sleep(3) 
driver.close()

5. Remove JavaScript Signature

One of the ways bot detectors like FingerprintJS and Imperva work is by inspecting the JavaScript signature inside webdrivers, like chromedriver and geckodriver. This signature is stored in the cdc_ variable. Websites look for the cdc_ variable in the document before denying access.

We'll use a tool called Agent Ransack to search for this signature in the chromedriver.exe binary file. It works the same way for web drivers like geckodriver and edgedriver.

Normal ChromeDriver
Click to open the image in fullscreen

As you can see, the signature is $cdc_asdjflasutopfhvcZLmcfl_. In order to evade detection, we can change "cdc" to a string of the same length like "abc". First we need to open the binary of chromedriver. We'll use vim editor to open and edit the binary file.

You can download vim here. Click on "standard self-installing executable" for Windows. The program comes pre-installed by default for Mac and Linux.

Install Vim
Click to open the image in fullscreen

After installation, open CMD and type this and navigate to the folder. Then, run the following command:

vim.exe <pathTo>\chromedriver.exe

Then type :%s/cdc_/abc_/g to search and replace cdc_ with abc_.

Search and replace variables
Click to open the image in fullscreen

Next, to exit vim, type :wq and hit Enter to save the changes.

Some files might be generated with ~ at the end of the file names. Delete these files.

New files in folder
Click to open the image in fullscreen

Now, let's try the same search using Agent Ransack.

Final result
Click to open the image in fullscreen

As you can see now, the cdc_ signature variable is not found in the file.

6. Using Cookies

When trying to scrape data from social media platforms or other sites that require some form of authentication, it's very common to log in repeatedly. This iterative authentication request raises an alarm, and the account might be blocked or face a CAPTCHA or JavaScript challenge for verification. In order to avoid this, we can use cookies. After logging in once, we can collect the login session cookies so that we can reuse them in the future.

7. Follow The Page Flow

When interacting with a website, it's important to follow the same flow that a human user would. This means clicking on links, filling out forms and navigating through the website in a natural way. By following the page flow, you can make it less obvious that you're performing automation.

8. Using a Browser Extension

Another way to bypass selenium detection is by using a browser extension, like uBlock Origin, to block JavaScript challenges and CAPTCHAs from being loaded on the page. This can help reduce the chances of your bot being detected by these types of challenges.

uBlock Origin is a free, open-source browser extension that is designed to block unwanted content (such as ads, tracking scripts and malware) from being loaded on web pages. It can also be configured to block JavaScript challenges and CAPTCHAs, which can help reduce the chances of your bot being detected by these types of challenges.

To use uBlock Origin to avoid Selenium bot detection, you'll need to install the extension in your browser and configure it to block JavaScript challenges and CAPTCHAs. You can then use Selenium to interact with the browser as you normally would, and uBlock Origin will automatically block any unwanted content from being loaded on the page.

It's important to note that uBlock Origin may not work with all websites, and it may not be able to block all types of JavaScript challenges and CAPTCHAs. However, it can be a useful tool for reducing the chances of your bot being detected by these types of challenges, especially when used in combination with other methods.

9. Using a CAPTCHA Solving Service With Selenium

If you need to solve CAPTCHAs as part of your bot's workflow, you can use a CAPTCHA solving service like Anti-CAPTCHA or 2Captcha. These services use real people to solve CAPTCHAs, which can reduce the chances of your bot being detected as a bot.

When using a CAPTCHA solving service with Selenium, it's important to use it sparingly in order to avoid bot detection. This means that you should only use the service when it is absolutely necessary, and avoid using it for every single CAPTCHA that you encounter.

There are several reasons for this. First, using the service too frequently can raise red flags and increase the chances of your bot being detected. Websites may be more likely to suspect that you are using a bot if you are using a CAPTCHA solving service for every single CAPTCHA that you encounter.

Second, using the service too frequently can be costly. Many CAPTCHA solving services charge a fee for each CAPTCHA that they solve, so using the service too frequently can significantly increase the cost of running your bot.

To avoid these issues, it's a good idea to use the CAPTCHA solving service only when it is absolutely necessary, and to use other methods (such as natural delays or browser extensions) to avoid CAPTCHAs when possible. This can help to reduce the chances of your bot being detected, and can also help to reduce the overall cost of running your bot.

Alternatives to base Selenium

Using Selenium to avoid bot detection is quite straightforward but there are some other methods that are as efficient as Selenium, like the use of Chromedrivers, Cloudscraper and APIs.

ZenRows API

ZenRows' API-based antibot bypassing solution is one of the highest-rated solutions for web crawlers and web scrapers. It works by making an API call for a range of antibot solutions and provides antibot solutions like rotating proxies, custom headers, WAF bypass and CAPTCHA bypass tools.

import requests 
 
# Unique API key and URL of page you want to scrape 
url = "https://nowsecure.nl" 
proxy = "http://YOUR_API_KEY:[email protected]:8001" 
proxies = {"http": proxy, "https": proxy} 
response = requests.get(url, proxies=proxies, verify=False) 
print(response.text)
ZenRows sucessful result
Click to open the image in fullscreen

Undetected_ChromeDriver

Undetected_ChromeDriver is an open-source project that makes Selenium ChromeDriver "look" human. It's an optimized Selenium ChromeDriver patch that doesn't trigger antibot services like Distill Network, Imperva, DataDome and Botprotect.io.

It bypasses bot detection measures like JavaScript challenges more effectively than ordinary Selenium. JavaScript challenges usually serve as a means to mitigate cyberattacks like DDOS attacks and also prohibit scraping. The challenge consists of sending a JavaScript code that includes a challenge to the browser. Legitimate browsers have a JavaScript stack and will understand and pass the challenge.

from selenium import webdriver 
import chromedriver_autoinstaller 
 
chromedriver_autoinstaller.install() 
driver = webdriver.Chrome() 
driver.get("https://nowsecure.nl")
Selenium for protected webpages
Click to open the image in fullscreen

As you can see, Selenium fails to bypass the security in the above website, so let's give undetected_ChromeDriver a try.

# pip install undetected-chromedriver 
import undetected_chromedriver.v2 as uc 
 
# Initializing driver 
driver = uc.Chrome() 
 
# Try accessing a website with antibot service 
driver.get("https://nowsecure.nl")
Result with undetected_chromedriver
Click to open the image in fullscreen

Awesome! We were able to avoid the bot detectors with undetected_ChromeDriver.

Undetected_ChromeDriver also works on Brave Browser and many other Chromium-based browsers. For more, you can check out this project on GitHub.

Even though it is able to bypass some bot detection measures better than base Selenium, undetected_chromedriver still falls short when it comes to bypassing strong antibot protections, like Cloudflare and Akamai.

Cloudscraper

Cloudflare is one of the most popular antibot solution providers in the market. It's a Python module to bypass Cloudflare's antibot (also known as "I'm Under Attack Mode" or IUAM) implemented with requests.

Cloudflare employs active and passive bot detection techniques. For detecting bots passively Cloudflare maintains a catalog of devices, IP addresses, and behavioral patterns known to be associated with malicious bot networks. It then checks the IP address reputation, HTTP request headers, TLS fingerprinting and so on to filter out the bots. It also employs active detection methods, including CAPTCHAs, canvas fingerprinting and event tracking.

Cloudflare's main bot protection method, known as "The Waiting Room", checks whether the client supports JavaScript and if it's able to solve the sophisticated JavaScript challenges. Cloudscraper relies on a JavaScript Engine/Interpreter to solve JavaScript challenges of the protected websites.

Any script using Cloudscraper will sleep for about 5 seconds for the first visit to any site with Cloudflare antibots enabled.

# Install cloudscraper with pip install cloudscraper 
import cloudscraper 
 
# Create cloudscraper instance 
scraper = cloudscraper.create_scraper() 
# Or: scraper = cloudscraper.CloudScraper() # CloudScraper inherits from requests.Session 
print(scraper.get("https://nowsecure.nl").text)

Note that Cloudscraper doesn't work for the latest version of Cloudflare antibot protection.

Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

Conclusion

In this article, we've discussed why websites use antibots, how they work and the best ways to avoid bot detection with Selenium. We also have seen alternative tools, like undetected-ChromeDriver, cloudscraper and ZenRows' API solution to avoid detection while web scraping with Python.

ZenRows API solution is currently one of the best options to avoid bot detection, and you can get your API key for free.

Did you find the content helpful? Spread the word and share it on Twitter, LinkedIn, or Facebook.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Want to keep learning?

We will be sharing all the insights we have learned through the years in the following blog posts. If you don't want to miss a piece and keep learning, we'd be thrilled to have us in our newsletter.

No spam guaranteed. You can unsubscribe at any time.