Does your web scraper get blocked by Cloudflare? One of the best ways to solve this problem is to use a headless browser, like Selenium. Unfortunately, plain Selenium may get blocked by Cloudflare's anti-bot systems.
But there are methods to bypass Cloudflare with Selenium. This guide will cover a few of the best ones:
- Method #1: Undetected ChromeDriver.
- Method #2: SeleniumBase.
- Method #3: Selenium Stealth plugin.
- Method #4: Premium proxies.
- Method #5: Web scraping API.
Let's go!
How Does Cloudflare Detect Selenium?
Cloudflare is a content delivery network (CDN) and cyber security provider. On the security side, it offers a Web Application Firewall (WAF) to defend protected websites against cyber threats, such as cross-site scripting (XSS) and DDoS attacks.
Cloudflare stops malicious HTTP traffic from moving to the server and performs security checks to mitigate Layer 7 (application layer) DDoS attacks.
Unfortunately, Cloudflare's security system doesn't spare web scrapers. It can detect and block automated browsers like the Selenium WebDriver, recognizing them as a bot.
If you try to access a Cloudflare-protected website, an interstitial page appears for about 5 seconds to analyze your network traffic for threats or bot-like signals. If the checks on the HTTP traffic are genuine, the server redirects the user to the page. Otherwise, the interstitial page triggers a CAPTCHA, preventing access to the target page.
Cloudflare's bot detection techniques can be passive or active, depending on the website's implementation. Passive bot detection uses backend server detection techniques, like TLS fingerprinting, HTTP request headers and IP address reputation analysis, to identify bots. Active bot detection happens on the client side, using CAPTCHAs, event tracking, canvas fingerprinting, etc.
You can learn more about Cloudflare in our general guide to bypassing Cloudflare to learn more.
Can Selenium Bypass Cloudflare?
Yes, it's possible to bypass Cloudflare with Python in Selenium. While using vanilla Selenium might be insufficient, you can install extended libraries to avoid bot detection in Selenium.
To test vanilla Selenium's anti-bot bypass efficiency, let's see how it performs against a Cloudflare-protected website like the DataCamp sign-in page using the following code block:
# import the required libraries
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
# create an instance of Chrome options
options = Options()
# add argument for headless mode
options.add_argument("--headless=new")
# initialize the Chrome driver with the specified options
driver = webdriver.Chrome(options=options)
# navigate to the specified URL
driver.get("https://www.datacamp.com/users/sign_in")
# wait for 20 seconds to allow the page to load fully
time.sleep(20)
# take a screenshot of the current page and save it
driver.save_screenshot("datacamp.png")
# close the browser
driver.close()
After running the script, the request got stuck in the interstitial Cloudflare page, preventing access to the HTML elements:
As you can see, the web page cannot be accessed with plain Selenium. Now, let's see what we can do about that.
How to Bypass Cloudflare With Selenium
As we've discussed and shown, using vanilla Selenium for Cloudflare doesn't work since it can't access sites with complex anti-bot services. Let's look at five proven tweaks and tricks to bypass Cloudflare with Selenium.
Method #1: Undetected ChromeDriver
Undetected ChromeDriver is a selenium.webdriver.Chrome
patch developed to access protected sites without triggering anti-bot measures. With Undetected ChromeDriver, Selenium's ChromeDriver gains more stealth, allowing you to bypass bot detectors like Cloudflare.
To get started with the Undetected ChromeDriver in Python, install it with pip
:
pip3 install undetected-chromedriver
Let's use nowsecure.nl as a demo website with simple Cloudflare protection to demonstrate how to use the Undetected ChromeDriver.
Create a Python file, import undetected_chromedriver
and instantiate it in headless mode. Request the target website, use Python's built-in time.sleep
method to wait for the page to load successfully, and use the maximize_window()
method to maximize the window:
# import the required libraries
import undetected_chromedriver as uc
import time
# create an instance of ChromeOptions for undetected_chromedriver
options = uc.ChromeOptions()
# set headless mode to True (runs Chrome in background)
options.headless = True
# initialize the undetected Chrome driver with specified options
driver = uc.Chrome(use_subprocess=True, options=options)
# navigate to the specified URL
driver.get("https://nowsecure.nl/")
# wait for 20 seconds to allow the page to load fully
time.sleep(20)
# take a screenshot of the current page and save it
driver.save_screenshot("nowsecure.png")
# close the browser
driver.close()
The above grabs the following screenshot, showing a checkmark confirming anti-bot bypass:
While Undetected ChromeDriver can help you access and scrape websites with basic anti-bot protection, it has a low success rate against DataCamp. It also doesn't work against websites like the G2 Reviews, which uses the most advanced Cloudflare security.
However, this is only one of five methods. There are more techniques you can apply.
Method #2: SeleniumBase
SeleniumBase is a web scraping and crawling tool in Python that lets you run Selenium in stealth mode using the Undetected ChromeDriver. SeleniumBase is more efficient than the Undetected ChromeDriver because it uses advanced browser patches to bypass anti-bot checks.
Let's see how SeleniumBase performs against the DataCamp sign-in page. First, install the library using pip
:
pip3 install seleniumbase
Import the WebDriver from the SeleniumBase and instantiate it with the Undetected ChromeDriver plugin:
# import the required library
from seleniumbase import Driver
# create a Driver instance with undetected_chromedriver (uc) and headless mode
driver = Driver(uc=True, headless=True)
Request the target website, pause the browser for 20 seconds, and grab a screenshot of the webpage. Then, quit the browser:
# ...
# navigate to the specified URL
driver.get("https://www.datacamp.com/users/sign_in")
# pause execution for 20 seconds
driver.sleep(20)
# take a screenshot of the current page and save it
driver.save_screenshot("datacamp.png")
# close the browser and end the session
driver.quit()
Here's the full code after combining the two snippets:
# import the required library
from seleniumbase import Driver
# create a Driver instance with undetected_chromedriver (uc) and headless mode
driver = Driver(uc=True, headless=True)
# navigate to the specified URL
driver.get("https://www.datacamp.com/users/sign_in")
# pause execution for 20 seconds
driver.sleep(20)
# take a screenshot of the current page and save it
driver.save_screenshot("datacamp.png")
# close the browser and end the session
driver.quit()
SeleniumBase returns a screenshot of the sign-in page, confirming that it bypassed Cloudflare's waiting room:
That's a step ahead! Still, SeleniumBase can't bypass highly protected websites at scale. Besides, the library is open source, allowing Cloudflare's developers to gain insights into its bypass mechanisms and block it.
Keep reading to learn more ways to bypass Cloudflare with Selenium.
Method #3: Selenium Stealth Plugin
The Selenium Stealth plugin is a helper that modifies Selenium with real browser fingerprints, e.g., setting the WebDriver navigator property to false, replacing the HeadlessChrome
User Agent in headless mode with an actual Chrome User Agent, and more.
However, Selenium Stealth has some limitations. It only partially patches Selenium and leaks some bot-like attributes, reducing its chances of bypassing advanced detection techniques of more complex Cloudflare security measures. Additionally, it doesn't handle IP bans or geo-restrictions out of the box.
Read our detailed tutorial on using Selenium Stealth in Python to learn more about bypassing Cloudflare with the Selenium Stealth plugin.
Method #4: Premium Proxies
Proxies route your requests through a different IP address, so it looks like you're browsing from another machine. Adding proxies to your scraper is handy for bypassing rate-limited and geo-restricted IP bans.
You can easily integrate proxies with Selenium. There are two categories of proxies: free and premium. Free proxies are only suitable for quick prototyping or testing. Due to their short lifespan, you shouldn't use them for real-life projects.ย
For large-scale web scraping, you should use premium proxies since they're more reliable and efficient. IP auto-rotation and geo-targeting will automate your web scraping process and give you a complete ant-bot bypass.
One of the best premium proxy providers with such integrations is ZenRows. In addition to premium residential proxies, ZenRows provides a full-scale scraping feature and helps you extract geo-restricted or rate-limited content at scale without getting blocked.
To learn more, check out our tutorial on implementing proxies in Selenium.ย
While proxies sometimes work, Cloudflare implements more security measures beyond rate-limiting, so they may be insufficient against its advanced anti-bot mechanisms.
Are you curious about the method that works all the time? Jump to the next section.
Method #5: Web Scraping API to Bypass Selenium Cloudflare Every Time
Open-source Selenium plugs to bypass Cloudflare often struggle to keep up with Cloudflare's advanced detection techniques. The only way to bypass Cloudflare with guaranteed success is to use a web scraping API, such as ZenRows.ย
ZenRows provides all the tools you need to scrape any protected website at scale without limitations. It features auto-rotating premium proxies, auto-parsing, CAPTCHA and anti-bot auto-bypass, and more.ย
With ZenRows, you only need to make a single API call with your chosen programming language and watch it do the bypassing job behind the scenes. ZenRows also acts as a headless browser to replace Selenium, featuring various JavaScript instructions for interacting with web pages like humans.
To see how ZenRows works, let's use it to access and scrape the G2 Reviews website, which uses the most advanced Cloudflare protection.
Sign up to open the ZenRows Request Builder. Paste the target URL in the link box and activate Premium Proxies and JS Rendering. Then, select Python as your programming language and choose the API connection mode. Copy and paste the generated code into your scraper file
The generated code should look like this:
# pip install requests
import requests
url = "https://www.g2.com/products/asana/reviews"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
"url": url,
"apikey": apikey,
"js_render": "true",
"premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)
The above scraper accesses the protected website and scrapes its full-page HTML, as shown:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
<title>Asana Reviews, Pros + Cons, and Top Rated Features</title>
<!-- ... -->
</head>
<body>
<!-- other content omitted for brevity -->
</body>
Congratulations! Your scraper now bypasses the highest Cloudflare protection level with ZenRows.
Conclusion
In this guide, you've learned five ways to improve vanilla Selenium and bypass Cloudflare. Techniques such as using Undetected ChromeDriver, SeleniumBase, the Stealth Plugin, and premium proxies offer varying levels of success by targeting specific aspects of Selenium's operation.
However, the only method that works 100% of the time is employing a web scraping API, such as ZenRows. This solution handles all anti-bot bypass technicalities behind the scenes while you focus on your scraping logic.
Try ZenRows for free now without a credit card!
Frequent Questions
What Is Selenium?
Selenium is a Python library for automating web browsers and scraping web pages. Selenium has a WebDriver to emulate user interaction and provide interactivity in various ways, like enabling the clicking of buttons, scrolling a page, executing custom JavaScript code, simulating user inputs and so on. It automates processes on several browsers, including Firefox and Chrome, using the Webdriver protocol.