Do you still get blocked even while scraping with the Undetected ChromeDriver in Python? Changing the User Agent can increase your chances of bypassing anti-scraping measures.
This tutorial shows you what an Undetected ChromeDriver User Agent is and how to change it to a custom one.
What Is Undetected ChromeDriver User Agent?
The User Agent is part of the HTTP request headers, a set of information that describes the request source (browser or HTTP client) and determines how the server handles a request.
The User Agent is an essential component of the HTTP request headers. It describes the property of the client or browser sending a request, including its version, host platform, and rendering engine.Â
Here's what an actual Chrome browser User Agent header looks like:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36
Similarly, the Undetected ChromeDriver User Agent defines the User Agent header sent while scraping with the Undetected ChromeDriver library.
Although it resembles the User Agent of an actual Chrome browser, the Undetected ChromeDriver's default User Agent has a HeadlessChrome
bot-like property in headless mode.Â
Check it out yourself by requesting https://httpbin.io/user-agent
, a test website that returns your User Agent header:
# pip3 install undetected-chromedriver
from selenium.webdriver.common.by import By
import undetected_chromedriver as uc
# set up Undetected ChromeDriver
options = uc.ChromeOptions()
# run in headless mode
options.headless = True
# create a new instance of the Chrome driver
driver = uc.Chrome(options=options)
# navigate to the URL
driver.get("https://httpbin.io/user-agent")
# get the response body
response_body = driver.find_element(By.TAG_NAME, "body").text
# print the response
print(response_body)
# close the browser
driver.quit()
See below what the Undetected ChromeDriver default User Agent looks like. The HeadlessChrome
property tells the server that your request is automated, increasing the chances of getting blocked:
{
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/127.0.0.0 Safari/537.36"
}
The User Agent header is usually a focus during web scraping because the server can leverage its information to determine whether a request is bot-like. It can be spoofed to trick the server into accepting an unwanted request.
For instance, you can replace the above HeadlessChrome
flag with Chrome
to appear as a legitimate Chrome browser.Â
Keep reading to see how to achieve that.
How to Change User Agent in Undetected ChromeDriver
The Undetected ChromeDriver supports the ChromeOptions
, which allows you to set a custom User Agent header.
To see how to set a custom User Agent in Undetected ChromeDriver, we'll change the default bot-like User Agent to the actual Chrome User Agent below:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36
Changing the default User Agent to a custom one in Undetected ChromeDriver involves adding the new value to the ChromeOptions
. First, specify the User Agent string. Then add it to Chrome Options:
# ...
# specify the custom User Agent
custom_user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
# add the custom User Agent to Chrome Options
options.add_argument(f"--user-agent={custom_user_agent}")
Modify the previous code with the above snippet, targeting the same test website (https://httpbin.io/user-agent
):
# import the requested libraries
from selenium.webdriver.common.by import By
import undetected_chromedriver as uc
# set up Undetected ChromeDriver
options = uc.ChromeOptions()
# specify the custom User Agent
custom_user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
# add the custom User Agent to Chrome Options
options.add_argument(f"--user-agent={custom_user_agent}")
# run in headless mode
options.headless = True
# create a new instance of the Chrome driver
driver = uc.Chrome(options=options)
# navigate to the URL
driver.get("https://httpbin.io/user-agent")
# get the response body
response_body = driver.find_element(By.TAG_NAME, "body").text
# print the response
print(response_body)
# close the browser
driver.quit()
The above code outputs your custom User Agent, showing that the Undetected ChromeDriver now uses the new string:
{
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
}
Congratulations, you've removed the bot-like HeadlessChrome
property from the Undeceted ChromeDriver's User Agent string.Â
However, anti-bots can block you when they discover you're sending multiple requests from a single machine with the same User Agent. So, using a single custom User Agent is insufficient, especially while scraping at scale.
Rotating your user agents is best to boost your chances of avoiding blocks. You'll learn to do that in the next section.
Rotate User Agents in Undetected ChromeDriver
As mentioned earlier, sending many requests with the same User Agent results in blocking. Rotating the User Agent allows you to mimic different users, reducing the chances of anti-bot detection.Â
User Agent rotation involves switching between User Agents from a pool so your scraper uses a different User Agent per request.
To create a User Agent rotator, you'll use Python's random
library to shuffle a User Agent list. Then, using the built-in' itertools' package, you'll create a generator to rotate that shuffled list.
Add both libraries to your imports. Then, define a function that rotates User Agent strings from a shuffled list. This function accepts a user_agent_list
parameter and returns a generator of rotating random User Agents:
# import the required libraries
# ...
import random
import itertools
# ...
# define a User Agent rotator
def user_agent_rotator(user_agent_list):
# shuffle the User Agent list
random.shuffle(user_agent_list)
# rotate the shuffle to ensure all User Agents are used
return itertools.cycle(user_agent_list)
Shuffling the list before randomizing ensures that the randomization is more robust and less biased, allowing the code to cycle through the list and not concentrate on specific User Agents.
The next step is to create a User Agent list. Remember that using an actual User Agent string is essential to avoid getting detected as a bot during scraping. We'll grab a few User Agents from the top list of User Agents for web scraping and form them into a list.
Add the list to your scraper like so:
# ...
# create a User Agent list
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
# ... add more User Agents
]
Since the rotator function returns a generator, initialize and add it to ChromeOptions
. The next
function ensures that the generator picks the next User Agent string from the shuffled list:
# initialize a generator for the User Agent rotator
user_agent_cycle = user_agent_rotator(user_agents)
# add the custom User Agent to Chrome Options
options.add_argument(f"--user-agent={next(user_agent_cycle)}")
Update the previous code with these changes, and you'll get this final code:
# import the required libraries
from selenium.webdriver.common.by import By
import undetected_chromedriver as uc
import random
import itertools
# define a User Agent rotator
def user_agent_rotator(user_agent_list):
# shuffle the User Agent list
random.shuffle(user_agent_list)
# rotate the shuffle to ensure all User Agents are used
return itertools.cycle(user_agent_list)
# set up Undetected ChromeDriver
options = uc.ChromeOptions()
# create a User Agent list
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
# ...
]
# initialize a generator for the User Agent rotator
user_agent_cycle = user_agent_rotator(user_agents)
# add the custom User Agent to Chrome Options
options.add_argument(f"--user-agent={next(user_agent_cycle)}")
# run in headless mode
options.headless = True
driver = uc.Chrome(options=options)
# navigate to the URL
driver.get("https://httpbin.io/user-agent")
# get the response body
response_body = driver.find_element(By.TAG_NAME, "body").text
# print the response
print(response_body)
# close the browser
driver.quit()
Execute the above code a couple of times. You'll see that it outputs a random User Agent from the list per request. Here's a sample result for five requests:
{
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
}
{
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
}
{
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
}
{
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
}
{
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
}
You've just created a custom User Agent rotator with the Undetected ChromeDriver.
Ensure you keep the browser versions updated while setting User Agents to avoid using older browsers. Websites often detect and block requests from outdated browsers, as these are less likely to be used by real users. As done in the above example, we updated the User Agent versions to Chrome 127, the latest version at the time of writing. Â
Additionally, avoid using a User Agent that doesn't match the chosen browser. For instance, using a Firefox User Agent in this case isn't good practice because you've used a uc.Chrome
WebDriver. Adding a Firefox User Agent to a Chrome WebDriver signals the server that you're a potential bot.
Manually maintaining and rotating a User Agent list is difficult at scale. Fortunately, there's a way to change the User Agent at scale without stress. Keep reading to learn this simple approach.Â
Change User Agent At Scale And Avoid Getting Blocked
The previous User Agent rotation method may help you at the beginning of your web scraping journey, but it isn't sustainable. The list gets longer as you scale, making it challenging to maintain.
Even if you combine that method with other solutions, such as adding proxies, it's still not foolproof against sophisticated anti-bot systems, which use advanced security measures to detect bots.
The best way to auto-rotate User Agents and bypass any anti-bot measure is to use a web scraping API like ZenRows. In addition to User Agent auto-rotation, ZenRows fixes your request headers, auto-rotates premium proxies, and bypasses CAPTCHAs and any other anti-bot measures under the hood.
ZenRows also works as a headless browser, allowing you to completely replace the Undetected ChromeDriver.Â
Let's use ZenRows to scrape the G2 Reviews page, a website heavily protected by Cloudflare, to see how it works. The previous scraper can't access this protected page.
To access that protected website, sign up to open the ZenRows Request Builder. Paste the target website in the link box and activate Premium Proxies and JS Rendering. Select Python as your preferred language and choose the API connection mode. Copy and paste the generated code into your scraper file:
The generated code should look like the following:
# pip install requests
import requests
url = "https://www.g2.com/products/asana/reviews"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
"url": url,
"apikey": apikey,
"js_render": "true",
"premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)
The above code prints the protected website's full-page HTML, showing that your scraper bypasses its protection:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
<title>Asana Reviews, Pros + Cons, and Top Rated Features</title>
<!-- ... -->
</head>
<body>
<!-- other content omitted for brevity -->
</body>
Forget about getting blocked! You just build a scraper that bypasses anti-bots using ZenRows.
Conclusion
You've learned to set a custom User Agent with the Undetected ChromeDriver in Python. Using the appropriate User Agent header adds a human touch to your request, increasing the chances of bypassing anti-bots.
However, while manually rotating the User Agent works in some cases, it's challenging to maintain and may not bypass sophisticated anti-bot measures. We recommend using ZenRows, an all-in-one web scraping solution, to avoid anti-bots and scrape any website at scale without getting blocked.
Try ZenRows for free now without a credit card!