How to Use Fake User Agent for Web Scraping

December 15, 2023 · 4 min read

Using a fake user agent to mimic a real user during web scraping is a common technique to avoid getting blocked.

This article will demonstrate how to randomize User-Agent headers using Python's fake-useragent library.

What Is a Fake User Agent?

A fake user agent is a substitute for your default request user agent, making your scraper appear as a regular browser. Despite the term "fake", it can enhance your scraper's ability to access a website without being identified as a bot. 

Most HTTP clients, like the Python Requests, may have a user agent that looks like the following, which is more prone to blocking:

Example
python-requests/2.31.0

However, a regular user agent sent by the browser might look like the one below. It tells the website that it's a Chromium Edge browser from a Windows 10 64-bit OS and is compatible with Safari and the Gecko rendering engine.

Example
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36

Replacing the default user agent with such a descriptive string eliminates the bot stigma and gives your scraper a solid browser footing. Let's see how to do that.

How to Use a Fake User Agent Library in Python

Manually tweaking the user agent can be time-consuming, unsustainable, and less efficient than automatic rotation. Thankfully, Python has a library called fake-useragent that lets you generate and randomize valid user agent strings on the fly.

Sounds cool? Let's see how to use it.

Step 1: Install and Set up the Tool

The first step is to install fake-useragent via pip from the command line:

Terminal
pip install fake-useragent

Once installed, you're ready to start generating browser-like user agents. 

Step 2: Generate Fake User Agents

The library's use is pretty straightforward and begins with an instance of the UserAgent class, and you can call different user agent attributes from it. 

For example, the following Python request rotates the user agents from any browser and platform using  ua.random. It passes this to the request headers dictionary before sending the request:

program.py
# Import the required libraries
import requests
from fake_useragent import UserAgent
 
# Instantiate the UserAgent class
ua = UserAgent()
 
# Get random user agents
random_ua = ua.random
 
# Specify the request URL
url = 'https://httpbin.io/'
 
# Pass the random user agents to the user-agent headers
request_headers = {
    'user-agent': random_ua
}
 
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
 
# Resolve response and print the user agent information
if response.status_code == 200:
    print(response.request.headers['User-Agent'])
else:
    print(response.status_code)

This outputs a random user-agent per request, as shown:

Output
Mozilla/5.0 (Windows NT 10.0; rv:109.0) Gecko/20100101 Firefox/117.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/117.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36 Edg/117.0.2045.31
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 OPR/101.0.0.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.75 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 OPR/102.0.0.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/118.0
Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/117.0
Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/116.0

What if you want to streamline user agents to a specific browser instead? Using a similar approach, the Python request below only randomizes the user agents from Chrome by calling ua.chrome

program.py
# Import the required modules
import requests
from fake_useragent import UserAgent
 
# Instantiate the UserAgent class
ua = UserAgent()
 
# Randomize user agents from Chrome only
chrome_uas = ua.chrome
 
# Specify the request URL
url = 'https://httpbin.io/'
 
# Pass the random user agents to the user-agent headers
request_headers = {
    'user-agent': chrome_uas
}
 
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
 
# Resolve response and print the user agent information
if response.status_code == 200:
    print(response.request.headers['User-Agent'])
else:
    print(response.status_code)

The code outputs different Chrome versions per request like so:

Output
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
...

Excited about the outcome? Let's see more advanced ways to use the Python fake user agent library in web scraping.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Advanced Features of the fake-useragent Library

The advanced features of the fake user agent library are handy for adding more granularity to your scraper's user agents.

Let's quickly see how they work with the requests library in Python.

Custom Browser List

This feature helps if you want to limit the randomized user agents to some browsers only and keep a tab of the browsers your scraper uses. 

You can achieve this by instantiating the UserAgent class with an optional browsers parameter. 

The code below randomizes the user agents between Safari and Chrome (UserAgent(browsers=['safari', 'chrome'])):

program.py
# Import the required modules
import requests
from fake_useragent import UserAgent
 
# Instantiate the UserAgent class with a browser list
ua = UserAgent(browsers=['safari', 'chrome'])
 
# Randomize the streamlined user agents
streamlined_uas = ua.random
 
# Specify the request URL
url = 'https://httpbin.io/'
 
# Pass the random user agents to the user-agent headers
request_headers = {
    'user-agent': streamlined_uas
}
 
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
 
# Resolve response and print the user agent information
if response.status_code == 200:
    print(response.request.headers['User-Agent'])
else:
    print(response.status_code)

This switches the user agent strings between Safari and Chrome versions per request, as seen below:

Output
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Safari/605.1.15
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.4 Safari/605.1.15
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
...

Nice! And it’s also possible to randomize with a specific OS environment.

Operating System Specification

Besides browser-specific rotation, you can limit your scraper request to user agents within one OS environment.

Similar to the previous example, this involves adding an os argument to the UserAgent instance. 

See the example below for rotating the user agents within the macOS environment using UserAgent(os='macos'):

program.py
# Import the required modules
import requests
from fake_useragent import UserAgent
 
# Instantiate the UserAgent class with a single OS
ua = UserAgent(os='macos')
 
# Randomize the streamlined user agents
streamlined_uas = ua.random
 
# Specify the request URL
url = 'https://httpbin.io/'
 
# Pass the random user agents to the user-agent headers
request_headers = {
    'user-agent': streamlined_uas
}
 
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
 
# Resolve response and print the user agent information
if response.status_code == 200:
    print(response.request.headers['User-Agent'])
else:
    print(response.status_code)

For each request, the code prints a different browser version within the macOS environment only:

Output
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/118.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.5.2 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/118.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36

Popularity Filtering

Using a popular browser during web scraping can reduce your chances of getting blocked. Popularity filtering in Python's fake-useragent allows you to limit the user agent rotation to the most popular browsers only. 

You can achieve this by specifying a min_percentage argument in the UserAgent instance (UserAgent(min_percentage=2.1)). Then pass the result to your user agent string, as shown: 

program.py
# Import the required modules
import requests
from fake_useragent import UserAgent
 
# Instantiate the UserAgent class with popularity filter
ua = UserAgent(min_percentage=2.1)
 
# Randomize the streamlined user agents
streamlined_uas = ua.random
 
# Specify the request URL
url = 'https://httpbin.io/'
 
# Pass the random user agents to the user-agent headers
request_headers = {
    'user-agent': streamlined_uas
}
 
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
 
# Resolve response and print the user agent information
if response.status_code == 200:
    print(response.request.headers['User-Agent'])
else:
    print(response.status_code)

Notice how the response repeats some browser versions in the output below, proving that you've now limited the user agent rotation to browsers with 2.1% and above usage popularity:

Output
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/117.0

There's also versatility for error handling. Let's see how that works next.

Fallback Parameter

If the fake-useragent library fails to obtain a user agent at any point, its fallback prevents a runtime error by using a backup user agent instead. This optimizes speed while scraping.

As shown in the code below, you can pass a fallback parameter to the UserAgent instance to use your chosen agent. 

The request reverts to the fall_back_ua string whenever the library fails to obtain a user agent. 

Note: Since 100% popularity isn't achievable, we've applied it here to test the fallback:

program.py
# Import the required modules
import requests
from fake_useragent import UserAgent
 
# Speficy a fallback user agent
fall_back_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'
 
# Instantiate the UserAgent class with popularity filter
ua = UserAgent(min_percentage=100.0, fallback= fall_back_ua)
 
# Randomize the streamlined user agents
streamlined_uas = ua.random
 
# Specify the request URL
url = 'https://httpbin.io/'
 
# Pass the random user agents to the user-agent headers
request_headers = {
    'user-agent': streamlined_uas
}
 
# Make a get request to the URL and get the response
response = requests.get(url, headers= request_headers)
 
# Resolve response and print the user agent information
if response.status_code == 200:
    print(response.request.headers['User-Agent'])
else:
    print(response.status_code)

Despite failing to obtain a random user agent, the request still goes through using the fallback user agent (fall_back_ua). The response includes a fallback notice in the output, as shown:

Output
Error occurred during getting browser: random, but was suppressed with fallback.
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36

Congratulations! You’ve now implemented a callback on your requests. 

ZenRows: To Auto-Rotate Your User-Agents Automatically

Rather than configuring manual rotation with open-source libraries, which are less updated, ZenRows will automatically rotate the user agents from a larger and more updated pool.

Besides this, ZenRows is a complete web scraping toolkit to avoid getting blocked using premium proxies, anti-CATPCHA, and many more. 

Try ZenRows and scrape at scale with confidence.

Conclusion

You've learned from this article how to use the fake-useragent library in Python to improve your web scraping efforts and avoid being blocked with smart user-agent rotation.

However, it presents limitations, and ZenRows stands out as the optimal solution for a straightforward way to automatically manage user agents, and scrape without getting blocked overall.

Ready to get started?

Up to 1,000 URLs for free are waiting for you