Using a fake user agent to mimic a real user during web scraping is a common technique to avoid getting blocked.
This article will demonstrate how to randomize User-Agent headers using Python's fake-useragent library.
What Is a Fake User Agent?
A fake user agent is a substitute for your default request user agent, making your scraper appear as a regular browser. Despite the term "fake", it can enhance your scraper's ability to access a website without being identified as a bot.Â
Most HTTP clients, like the Python Requests, may have a user agent that looks like the following, which is more prone to blocking:
python-requests/2.31.0
However, a regular user agent sent by the browser might look like the one below. It tells the website that it's a Chromium Edge browser from a Windows 10 64-bit OS and is compatible with Safari and the Gecko rendering engine.
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Replacing the default user agent with such a descriptive string eliminates the bot stigma and gives your scraper a solid browser footing. Let's see how to do that.
How to Use a Fake User Agent Library in Python
Manually tweaking the user agent can be time-consuming, unsustainable, and less efficient than automatic rotation. Thankfully, Python has a library called fake-useragent
that lets you generate and randomize valid user agent strings on the fly.
Sounds cool? Let's see how to use it.
Step 1: Install and Set up the Tool
The first step is to install fake-useragent
via pip
from the command line:
pip install fake-useragent
Once installed, you're ready to start generating browser-like user agents.Â
Step 2: Generate Fake User Agents
The library's use is pretty straightforward and begins with an instance of the UserAgent
class, and you can call different user agent attributes from it.Â
For example, the following Python request rotates the user agents from any browser and platform using  ua.random
. It passes this to the request headers dictionary before sending the request:
# Import the required libraries
import requests
from fake_useragent import UserAgent
# Instantiate the UserAgent class
ua = UserAgent()
# Get random user agents
random_ua = ua.random
# Specify the request URL
url = 'https://httpbin.io/'
# Pass the random user agents to the user-agent headers
request_headers = {
'user-agent': random_ua
}
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
# Resolve response and print the user agent information
if response.status_code == 200:
print(response.request.headers['User-Agent'])
else:
print(response.status_code)
This outputs a random user-agent per request, as shown:
Mozilla/5.0 (Windows NT 10.0; rv:109.0) Gecko/20100101 Firefox/117.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/117.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36 Edg/117.0.2045.31
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 OPR/101.0.0.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.75 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 OPR/102.0.0.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/118.0
Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/117.0
Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/116.0
What if you want to streamline user agents to a specific browser instead? Using a similar approach, the Python request below only randomizes the user agents from Chrome by calling ua.chrome
.Â
# Import the required modules
import requests
from fake_useragent import UserAgent
# Instantiate the UserAgent class
ua = UserAgent()
# Randomize user agents from Chrome only
chrome_uas = ua.chrome
# Specify the request URL
url = 'https://httpbin.io/'
# Pass the random user agents to the user-agent headers
request_headers = {
'user-agent': chrome_uas
}
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
# Resolve response and print the user agent information
if response.status_code == 200:
print(response.request.headers['User-Agent'])
else:
print(response.status_code)
The code outputs different Chrome versions per request like so:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
...
Excited about the outcome? Let's see more advanced ways to use the Python fake user agent library in web scraping.
Advanced Features of the fake-useragent Library
The advanced features of the fake user agent library are handy for adding more granularity to your scraper's user agents.
Let's quickly see how they work with the requests library in Python.
Custom Browser List
This feature helps if you want to limit the randomized user agents to some browsers only and keep a tab of the browsers your scraper uses.Â
You can achieve this by instantiating the UserAgent
class with an optional browsers
parameter.Â
The code below randomizes the user agents between Safari and Chrome (UserAgent(browsers=['safari', 'chrome'])
):
# Import the required modules
import requests
from fake_useragent import UserAgent
# Instantiate the UserAgent class with a browser list
ua = UserAgent(browsers=['safari', 'chrome'])
# Randomize the streamlined user agents
streamlined_uas = ua.random
# Specify the request URL
url = 'https://httpbin.io/'
# Pass the random user agents to the user-agent headers
request_headers = {
'user-agent': streamlined_uas
}
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
# Resolve response and print the user agent information
if response.status_code == 200:
print(response.request.headers['User-Agent'])
else:
print(response.status_code)
This switches the user agent strings between Safari and Chrome versions per request, as seen below:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Safari/605.1.15
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.4 Safari/605.1.15
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
...
Nice! And it’s also possible to randomize with a specific OS environment.
Operating System Specification
Besides browser-specific rotation, you can limit your scraper request to user agents within one OS environment.
Similar to the previous example, this involves adding an os
argument to the UserAgent
instance.Â
See the example below for rotating the user agents within the macOS environment using UserAgent(os='macos')
:
# Import the required modules
import requests
from fake_useragent import UserAgent
# Instantiate the UserAgent class with a single OS
ua = UserAgent(os='macos')
# Randomize the streamlined user agents
streamlined_uas = ua.random
# Specify the request URL
url = 'https://httpbin.io/'
# Pass the random user agents to the user-agent headers
request_headers = {
'user-agent': streamlined_uas
}
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
# Resolve response and print the user agent information
if response.status_code == 200:
print(response.request.headers['User-Agent'])
else:
print(response.status_code)
For each request, the code prints a different browser version within the macOS environment only:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/118.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.5.2 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/118.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Popularity Filtering
Using a popular browser during web scraping can reduce your chances of getting blocked. Popularity filtering in Python's fake-useragent allows you to limit the user agent rotation to the most popular browsers only.Â
You can achieve this by specifying a min_percentage
argument in the UserAgent
instance (UserAgent(min_percentage=2.1)
). Then pass the result to your user agent string, as shown:Â
# Import the required modules
import requests
from fake_useragent import UserAgent
# Instantiate the UserAgent class with popularity filter
ua = UserAgent(min_percentage=2.1)
# Randomize the streamlined user agents
streamlined_uas = ua.random
# Specify the request URL
url = 'https://httpbin.io/'
# Pass the random user agents to the user-agent headers
request_headers = {
'user-agent': streamlined_uas
}
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
# Resolve response and print the user agent information
if response.status_code == 200:
print(response.request.headers['User-Agent'])
else:
print(response.status_code)
Notice how the response repeats some browser versions in the output below, proving that you've now limited the user agent rotation to browsers with 2.1% and above usage popularity:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/117.0
There's also versatility for error handling. Let's see how that works next.
Fallback Parameter
If the fake-useragent library fails to obtain a user agent at any point, its fallback prevents a runtime error by using a backup user agent instead. This optimizes speed while scraping.
As shown in the code below, you can pass a fallback parameter to the UserAgent
instance to use your chosen agent.Â
The request reverts to the fall_back_ua
string whenever the library fails to obtain a user agent.Â
Note: Since 100% popularity isn't achievable, we've applied it here to test the fallback:
# Import the required modules
import requests
from fake_useragent import UserAgent
# Speficy a fallback user agent
fall_back_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'
# Instantiate the UserAgent class with popularity filter
ua = UserAgent(min_percentage=100.0, fallback= fall_back_ua)
# Randomize the streamlined user agents
streamlined_uas = ua.random
# Specify the request URL
url = 'https://httpbin.io/'
# Pass the random user agents to the user-agent headers
request_headers = {
'user-agent': streamlined_uas
}
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
# Resolve response and print the user agent information
if response.status_code == 200:
print(response.request.headers['User-Agent'])
else:
print(response.status_code)
Despite failing to obtain a random user agent, the request still goes through using the fallback user agent (fall_back_ua
). The response includes a fallback notice in the output, as shown:
Error occurred during getting browser: random, but was suppressed with fallback.
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Congratulations! You’ve now implemented a callback on your requests.Â
ZenRows: To Auto-Rotate Your User-Agents Automatically
Rather than configuring manual rotation with open-source libraries, which are less updated, ZenRows will automatically rotate the user agents from a larger and more updated pool.
Besides this, ZenRows is a complete web scraping toolkit to avoid getting blocked using premium proxies, anti-CATPCHA, and many more.Â
Try ZenRows and scrape at scale with confidence.
Conclusion
You've learned from this article how to use the fake-useragent library in Python to improve your web scraping efforts and avoid being blocked with smart user-agent rotation.
However, it presents limitations, and ZenRows stands out as the optimal solution for a straightforward way to automatically manage user agents, and scrape without getting blocked overall.
Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.