Are you scraping a Cloudflare-protected website and want to know if you can bypass it with the Undetected ChromeDriver?
This article answers that question and shows how to improve the Undetected ChromeDriver to avoid Cloudflare's detection.
Quick Answer: Probably Not But There Are Solutions
The short answer is that the Undetected ChromeDriver can't bypass Cloudflare independently without some modifications. Although the plugin aims to evade anti-bots like Cloudflare, it has some design flaws that give it off as a bot.
First, its request headers contain bot-like parameters like the "HeadlessChrome" User-Agent flag when in headless mode. This inconsistency allows anti-bot systems to detect the presence of an automated WebDriver. It also doesn't configure proxies automatically, increasing the likelihood of IP bans.
Another flaw is that it can't handle the browser fingerprinting and machine learning detection techniques employed by Cloudflare and other advanced anti-bot systems.
That said, there are a few ways to mitigate these limitations and improve the plugin's evasion capabilities. In the next section, you'll see how.
2 Methods to Improve Undetected ChromeDriver for Scraping
The Undetected ChromeDriver plugin doesn't stand a chance against Cloudflare, but you can improve it with the following two methods.
1. Use Premium Proxies
A proxy server changes your IP address, so it looks like you're requesting from a different location.
You can configure your scraper to use free proxies. However, those are short-lived and unreliable. The best option is to use premium web scraping proxies, which require authentication credentials.
Most of these premium services offer a proxy rotation feature to change your IP address per request, making your request more legitimate.
2. Optimize Your Headers
The request headers provide information about the request source. Inconsistencies in header values like the User-Agent and User-Agent Client Hint can get you blocked while scraping.
Optimizing the request headers can improve the plugin's ability to avoid blocks. For instance, the "HeadlessChrome" flag in the following User-Agent string reveals that your request is from an automated script and can get you blocked:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/123.0.0.0 Safari/537.36
A good User-Agent for web scraping should describe a legitimate browser. A typical browser User-Agent like the following is more likely to pass:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36
However, these solutions are usually not enough against Cloudflare’s advanced detection system. Fortunately, there’s an easier way to beat Cloudflare during web scraping. You’ll see this in the next section.
Best Alternative to Undetected ChromeDriver: Web Scraping API
Proxy and header configuration can increase your chances of bypassing Cloudflare with the Undetected ChromeDriver. However, manually modifying proxies and headers at scale can be difficult and unsustainable.
The best way to scrape a Cloudflare-protected website is to use a web scraping API like ZenRows. It configures the request headers, autorotates premium proxies, and bypasses CAPTCHAs and other anti-bot systems at scale.
Let's do two quick demos to access a Cloudflare-protected page like the G2 Reviews, starting with the Undetected ChromeDriver and then ZenRows.
Here's the code to access and extract the protected website's HTML with the Undetected ChromeDriver:
# import the required library
import undetected_chromedriver as uc
# run Chrome in headless mode
driver = uc.Chrome(headless=True)
# visit the target website
driver.get("https://www.g2.com/products/asana/reviews")
# print the page source
print(driver.page_source)
The code outputs the following, indicating that Undetected ChromeDriver got blocked by Cloudflare:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- ... -->
<title>Attention Required! | Cloudflare</title>
</head>
Now, let's access that web page using ZenRows.
Sign up to open the ZenRows Request Builder. Paste the target URL in the link box, toggle on JS Rendering, and activate Premium Proxies. Choose Python as your programming language and select the API request mode. Copy and paste the generated code into your script:
The generated code uses the Requests library as the HTTP client. Ensure you install it using pip
:
pip install requests
Here's a slightly modified version of the generated code:
# pip install requests
import requests
# define your request parameters
params = {
"url": "https://www.g2.com/products/asana/reviews",
"apikey": "<YOUR_ZENROWS_API_KEY>",
"js_render": "true",
"premium_proxy": "true",
}
# send your request and get the response
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)
The above code bypasses the Cloudflare-protected page and extracts its content, as shown:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
<title>Asana Reviews, Pros + Cons, and Top Rated Features</title>
</head>
<body>
<!-- other content omitted for brevity -->
</body>
Congratulations! You just bypassed Cloudflare with ZenRows.
ZenRows also supports JavaScript instructions for scraping dynamically loaded content, allowing you to replace the Undetected ChromeDriver and Selenium with ZenRows without dealing with browser instance overheads.
Conclusion
In this article, you've learned that setting up a proxy and modifying the request headers can improve Undetected ChromeDriver's ability to bypass Cloudflare.
However, these methods are usually insufficient against Cloudflare's advanced detection system. We recommend integrating ZenRows, the all-in-one scraping solution, to scrape any website without getting blocked. Try ZenRows for free!