Web Crawling Webinar for Tech Teams
Web Crawling Webinar for Tech Teams

User Agent in Python Requests: How to Change It

March 27, 2023 · 8 min read

Have you ever attempted web scraping with the Python Requests library only to be blocked by your target website? You're not alone!

The User Agent (UA) string is one of the most critical factors in website detection because it's like a fingerprint that identifies the client, so it easily gives you away as a bot.

However, you can fly under the radar and retrieve the data you want by randomizing the User Agent in Python Requests. You'll learn how to do that at scale in this tutorial.

What Is the User Agent in Python Requests

The User Agent is a key component of the HTTP headers sent along with every HTTP request.

These headers contain information the server uses to tailor its response, such as the preferred content format and language. Also, the UA header tells the web server what operating system and browser is making the request, among others.

For example, a Google Chrome browser may have the following string:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36


It shares that the requester uses the Mozilla product token, AppleWebKit as a rendering engine on its version 537.36, runs on a 64-bit Windows 10, displays compatibility with Firefox and Konqueror rendering engine, Google Chrome 111 as the browser, and there's Safari compatibility.

Similarly, here's an example of a Firefox UA:

Mozilla/5.0 (X11; Linux i686; rv:110.0) Gecko/20100101 Firefox/110.0.


And you see a mobile UA next:

Mozilla/5.0 (iPhone; CPU iPhone OS 15_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'


Your Python Request web scraper is denied access because its default UA screams "I am a bot." Let's see how!

The following script sends an HTTP request to http://httpbin.org/headers.

program.py
import requests

response = requests.get('http://httpbin.org/headers')

print(response.status_code)
print(response.text)

Since HTTPBin is an API, it produces the request's default headers as HTML content. So, the code above brings this result:

Output
200
{
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.28.2",
    "X-Amzn-Trace-Id": "Root=1-64107531-3d822dab0eca3d0a07faa819"
  }
}

Take a look at the User-Agent string python-requests/2.28.2, and you'll agree that any website can tell it isn't from an actual browser. That's why you have to specify a custom and well-formed User Agent for Python Requests.

Set User Agent Using Python Requests

To set the Python Requests User Agent, pass a dictionary containing the new string in your request configuration.

Start by importing the Python Requests library, then create an HTTP headers dictionary containing the new User Agent, and send a request with your specified User-Agent string. The last step is printing the response.

program.py
import requests
 
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36'}
 
response = requests.get('http://httpbin.org/headers', headers=headers)
 
print(response.status_code)
print(response.text)**Bold**

Note: You can find some well-formed User Agents for web scraping in our other article.

If you get a result similar to the example below, you've succeeded in faking your User Agent.

Output
200
{
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-6410831d-666f197b78c8e97a3013bea9"
  }
}
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Use a Random User Agent with Python Requests

You need to use a random User Agent in Python Requests to avoid being blocked because it helps identify you.

You can rotate between UAs using the random.choice() method. Here's a step-by-step example:

  1. Import random.
program.py
import random
  1. Create an array containing the User Agents you want to rotate.
program.py
user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15'
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15'
]
  1. Create a dictionary, setting random.choice() as the User-Agent string.
program.py
headers = {'User-Agent': random.choice(user_agents)}
  1. Include the random UA string in your request.
program.py
response = requests.get('https://www.example.com', headers=headers)
  1. Use a for loop to repeat steps three and four for multiple requests.
program.py
for i in range(7):
    headers = {'User-Agent': random.choice(user_agents)}
    response = requests.get('https://www.example.com', headers=headers)
    print(response.headers)

Putting it all together, your complete code will look like this:

program.py
import requests

import random

user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15'
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15'
]

for i in range(7):
    headers = {'User-Agent': random.choice(user_agents)}
    response = requests.get('http://httpbin.org/headers', headers=headers)
    print(response.text)

And here's the result:

Output
{
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",     
    "X-Amzn-Trace-Id": "Root=1-64109cab-38fc3e707383ccc92fda9034"
  }
}
 
{
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15",
    "X-Amzn-Trace-Id": "Root=1-64109cac-3971637b5318a6f87c673747"
  }
}
 
{
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-64109cac-42301391639825ca0497d9a3"
  }
}
 
// ...

Awesome, right?

However, that would work for a handful of requests. Let's see how to do this at scale for web scraping.

How To Rotate Infinite User Agents at Scale

Creating a reliable User Agent rotation system is more complex than it seems. Beyond maintaining thousands of strings, you'll need to constantly update browser versions, validate OS compatibility, and remove suspicious combinations.

Also, relying only on User Agents is not enough to bypass anti-bot systems. Modern websites look beyond the User Agent to identify automation, like checking your IP reputation, request patterns, header consistency, TLS fingerprinting, etc.

The most effective solution is to use a web scraping API like ZenRows. It provides auto-rotating up-to-date User Agents, premium proxy, JavaScript rendering, CAPTCHA auto-bypass, and everything you need to avoid getting blocked.

Let's see how ZenRows performs against a protected page like the Antibot Challenge page.

Start by signing up for a new account, and you'll get to the Request Builder.

building a scraper with zenrows
Click to open the image in full screen

Paste the target URL, enable JS Rendering, and activate Premium Proxies.

Next, select Python and click on the API connection mode. Then, copy the generated code and paste it into your script.

scraper.py
# pip3 install requests
import requests

url = "https://www.scrapingcourse.com/antibot-challenge"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
    "url": url,
    "apikey": apikey,
    "js_render": "true",
    "premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params, print(response.text)

The generated code uses Python's Requests library as the HTTP client. You can install this library using pip:

Terminal
pip3 install requests

Run the code, and you'll successfully access the page:

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Antibot Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Antibot challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Congratulations! 🎉 You’ve successfully bypassed the anti-bot challenge page using ZenRows. This works for any website.

Conclusion

In this guide, you've learned the essentials of managing User Agents with Python Requests:

  • Understanding what makes a well-formed User Agent string and why it matters.
  • Setting custom User Agents in your Python requests.
  • Implementing basic User Agent rotation with Python's random module.
  • Why User Agent management alone isn't enough for reliable web scraping.

Keep in mind that many websites use different anti-bot mechanisms to prevent web scraping. Integrate ZenRows to make sure you extract all the data you need without getting blocked. Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you