Web scraping with Scrapy enables quick data extraction from the web, but developers frequently get blocked by target websites. The reason? They use a variety of measures to detect bots, such as inspecting the User Agent request header.
In this tutorial, you'll learn the importance of the Scrapy User Agent, how to change it and also the way to auto-rotate Scrapy User Agent to increase your chances to succeed while scraping.
Let's get started!
What Is a User Agent in Scrapy?
When making an HTTP request, the client sends additional data to the server, known as request headers, which contain information about the client and the expected response.
HTTP request headers are represented as key-value pairs, with the key being the header name and the value being the data associated with that header. And for web scraping, the User Agent (UA) is a critical part of the headers because websites can identify you're a bot because of the information you send or because they can identify someone's making an unusually large amount of requests.
UAs typically include your operating system, browser version, and more. They look like this:
Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
However, the default Python Scrapy User Agent is different.
What Is the Default User Agent in Scrapy?
Scrapy provides a default User-Agent header that is used when making HTTP requests to websites. It looks like this:
Scrapy/2.9.0 (+https://scrapy.org)
2.9.0
represents the Scrapy version.
As you can assume, the default User Agent in Scrapy will make you be flagged as a bot and blocked easily. Yet luckily, the library allows you to customize your UA, and that's what we're going to do next.
How Do You Set a Custom User Agent in Scrapy?
In this section, you'll learn how to customize your User Agent and then rotate it so that your Scrapy spider impersonates a real browser.
As a prerequisite, create a spider.
Start by quickly installing Scrapy via pip
if you don't have it at the moment:
pip install scrapy
Then, create a new Scrapy project (for example, name it scraper
):
# Syntax: scrapy startproject <project_name>
scrapy startproject scraper
Now, make a basic spider and take https://httpbin.io/user-agent
as a target URL to check your new UA later. For that, use the genspider
command in your project directory:
# Syntax: scrapy genspider <spider_name> <target_website>
scrapy genspider headers httpbin.io/user-agent
Update the content of the spider file to log the response of your target URL:
import scrapy
class HeadersSpider(scrapy.Spider):
name = 'headers'
allowed_domains = ['httpbin.io']
start_urls = ['https://httpbin.io/user-agent']
def parse(self, response):
self.log(f'RESPONSE: {response.body}')
Run the spider in your project root:
# Syntax: scrapy crawl <spider_name>
scrapy crawl headers
You'll get a response like the one below in the Scrapy logs in your terminal:
"user-agent": "Scrapy/2.9.0 (+https://scrapy.org)"
This is the response from HTTPBin that displays our starting point: the default Scrapy User Agent. Let's change it.
Step 1: Generate a Random UA
Coming up with a real User Agent is essential, yet many scrapers don't get to match the small details, which doesn't put them one step ahead.
Check out our guide for an in-depth technical overview, or just grab a list of working UAs from it. For now, we'll use one that we grabbed there:
Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
Step 2: Set a New User Agent in Scrapy with Middleware
You can set a new UA in Scrapy in one of two ways: globally for all spiders in your project or locally for specific spiders. We recommend the first path, but we'll show you how to make the change in both ways.
Global Scope
To set the User Agent globally, specify the USER_AGENT
variable in the settings.py
file located in the scrapy
folder of your project's root directory.
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
Run the crawl command again, and you should see your custom UA in the logged response this time:
"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
Alternatively, you can specify the User-Agent
key of the DEFAULT_REQUEST_HEADERS
dictionary in the same settings file to get the same result:
DEFAULT_REQUEST_HEADERS = {
# ... Other headers
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
}
Local Scope
If you want to set the User-Agent for a concrete spider instead, you can specify USER_AGENT
in the custom_settings
dictionary property of the spider class in question.
# ...
class HeadersSpider(scrapy.Spider):
name = 'headers'
allowed_domains = ['httpbin.org']
start_urls = ['https://httpbin.org/headers']
custom_settings = {
'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
}
def parse(self, response):
yield { 'response': response.body }
You should get the same result as with the global scope approach.
Step 3: Build a User Agent Rotator in Scrapy
Rotating UAs is crucial to avoid being identified as the single source of many requests and therefore being blocked. So, let's look into how to do that.
We'll use the downloader middleware built-in feature to create a Scrapy User Agent middleware that selects a random one from a pool for each request.
Start by grabbing a couple of extra UAs from our list of User Agents for web scraping that you'll rotate through with Scrapy. Then, create a new USER_AGENTS
list in your settings.py
file and include your chosen UAs:
USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15'
]
Next, create a new rotator middleware in your middleware.py
file. We leveraged the process_request()
method, as well as the access to the request, since it's called for every request.
# ...
import random
# ...
class UARotatorMiddleware:
def __init__(self, user_agents):
self.user_agents = user_agents
@classmethod
def from_crawler(cls, crawler):
user_agents = crawler.settings.get('USER_AGENTS', [])
return cls(user_agents)
def process_request(self, request, spider):
user_agent = random.choice(self.user_agents)
request.headers['User-Agent'] = user_agent
Now, register the middleware as a downloader in your settings.py
file:
# ...
DOWNLOADER_MIDDLEWARES = {
# ... Other middlewares
'scratest.middlewares.UARotatorMiddleware': 400,
}
Finally, update your headers
spider to look like this:
import scrapy
class HeadersSpider(scrapy.Spider):
name = "headers"
allowed_domains = ["httpbin.io"]
start_urls = ["https://httpbin.io/user-agent"]
def parse(self, response):
self.logger.info(f'RESPONSE: {response.text}')
To test that it works, run the command scrapy crawl headers
about three times now, and you should get a response similar to this in your logs:
2023-07-16 17:01:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://httpbin.io/user-agent> (referer: None)
2023-07-16 17:01:45 [headers] INFO: RESPONSE: {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
}
====
2023-07-16 17:04:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://httpbin.io/user-agent> (referer: None)
2023-07-16 17:04:02 [headers] INFO: RESPONSE: {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
}
====
2023-07-16 17:04:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://httpbin.io/user-agent> (referer: None)
2023-07-16 17:04:29 [headers] INFO: RESPONSE: {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
}
Great! A randomly selected UA was chosen from your list for each request, which confirms that your Scrapy User Agent rotator is working with the strings you set manually! You're better equipped as a web scraper now.
How To Rotate Infinite User Agents at Scale
Creating a reliable Scrapy User Agent rotation system takes more work than just building a middleware. You need to keep updating your list with new browser versions, ensure operating systems match correctly, and remove outdated combinations.
Additionally, websites look beyond User Agents to detect bots. They analyze your request patterns, header consistency, and browser behavior. Even with perfect User Agent rotation in Scrapy, your spiders might still get blocked.
The most effective solution is to use a web scraping API like ZenRows. It provides auto-rotating, up-to-date User Agents, premium proxy, JavaScript rendering, CAPTCHA auto-bypass, and everything you need to avoid getting blocked.
Let's see how ZenRows performs against a protected page like the Antibot Challenge page.
Start by signing up for a new account, and you'll get to the Request Builder.

Paste the target URL, enable JS Rendering, and activate Premium Proxies.
Next, select Python and click on the API connection mode. Then, copy the generated code and paste it into your script.
# pip3 install requests
import requests
url = "https://www.scrapingcourse.com/antibot-challenge"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
"url": url,
"apikey": apikey,
"js_render": "true",
"premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params, print(response.text)
The generated code uses Python's Requests library as the HTTP client. You can install this library using pip:
pip3 install requests
Run the code, and you'll successfully access the page:
<html lang="en">
<head>
<!-- ... -->
<title>Antibot Challenge - ScrapingCourse.com</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
<h2>
You bypassed the Antibot challenge! :D
</h2>
<!-- other content omitted for brevity -->
</body>
</html>
Congratulations! 🎉 You’ve successfully bypassed the anti-bot challenge page using ZenRows. This works for any website.
Conclusion
In this guide, you've learned important things about User Agents in Scrapy:
- What User Agents are and why they matter.
- How to set custom User Agents globally and locally.
- Ways to rotate between different User Agents.
- Why User Agent management alone isn't enough.
Remember that websites use many techniques to detect spiders. Integrate ZenRows to make sure you extract all the data you need without getting blocked. Try ZenRows for free!