Have you ever been blocked from scraping your desired content due to an IP ban or geo-restrictions? There's a solution. Proxies let you mask your original IP, rotate addresses to avoid detection, and access geo-restricted data.
In this step-by-step tutorial, you'll learn everything you need to know about setting up a Scrapy proxy to improve your web scraping success.
How to Set up a Proxy With Scrapy
You can set up a proxy in Scrapy by adding a meta parameter to your request on the fly or using a custom middleware. Let's explore both options.
To see how each method works, you'll grab free proxies from the Free Proxy List website and confirm if it works by requesting HTTPBin's IP endpoint.
The free proxies used in this tutorial will likely not work at the time of reading because of their short lifespan. Feel free to grab a new one from the Free Proxy List website.
Now, let's begin with the meta parameter option.
Method 1: Add a Meta Parameter
This method involves passing your proxy address as a meta parameter in the scrapy.Request()
method.
Once you have your proxy address and port number, pass them into your Scrapy request, as shown in the following code block.
yield scrapy.Request(
url=url,
callback=self.parse,
meta={"proxy": "http://66.191.31.158:80"},
)
Here's what you get by updating your spider:
# pip3 install scrapy
import scrapy
class ScraperSpider(scrapy.Spider):
name = "scraper"
allowed_domains = ["httpbin.io"]
start_urls = ["https://httpbin.io/ip"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url=url,
callback=self.parse,
meta={"proxy": "http://66.191.31.158:80"},
)
def parse(self, response):
print(response.text)
The spider returns the proxy's IP address, as shown:
{
"origin": "66.191.31.158:38710"
}
Nice! You've successfully integrated a proxy in Scrapy. Next, you'll see how the custom middleware option works.
Method 2: Create a Custom Middleware
Scrapy middleware is an intermediary layer for processing, modifying, and filtering requests and responses before they reach the spider. The middleware proxy option offers an excellent way to manage multiple spiders, as you can manipulate proxy credentials without modifying your code.
Open the middlewares.py
file and create a new CustomProxyMiddleware
class.
The class uses the process_request
method to check if a request already has a proxy in its meta attribute. If not, it assigns one from the get_proxy
method, which obtains the proxy address (PROXY_ADDRESS
) from settings.py
. Otherwise, it uses the default proxy if no proxy address is specified in settings.py
:
class CustomProxyMiddleware(object):
def __init__(self):
self.proxy = None
def process_request(self, request, spider):
if "proxy" not in request.meta:
request.meta["proxy"] = self.get_proxy(spider.crawler)
def get_proxy(self, crawler):
self.proxy = crawler.settings.get("PROXY_ADDRESS")
return self.proxy
Go to settings.py
and add your middleware to the DOWNLOADER_MIDDLEWARE
options. Then, include a PROXY_ADDRESS
variable and assign your proxy address to it:
# ...
DOWNLOADER_MIDDLEWARES = {
"myproject.middlewares.CustomProxyMiddleware": 350,
# ...,
}
# include your proxy address
PROXY_ADDRESS = "http://66.191.31.158:80"
The integer in the DOWNLOADER_MIDDLEWARE
configuration is an order number that specifies the chain of middleware execution. An order number of 350 means your custom middleware will run after middleware with lower order numbers.
You can also add middleware at the spider level using custom settings:
class ScraperSpider(scrapy.Spider):
# ...
custom_settings = {
"DOWNLOADER_MIDDLEWARES": {
"myproject.middlewares.CustomProxyMiddleware": 350,
},
# include your proxy address
PROXY_ADDRESS = "http://66.191.31.158:80"
}
You can now run the spider without the meta parameter. The modified spider looks like this:
# pip3 install scrapy
import scrapy
class ScraperSpider(scrapy.Spider):
name = "scraper"
allowed_domains = ["httpbin.io"]
start_urls = ["https://httpbin.io/ip"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url=url,
callback=self.parse,
)
def parse(self, response):
print(response.text)
You'll see that the code returns the IP address of the proxy you set inside settings.py
:
{
"origin": "66.191.31.158:38710"
}
That works! You've mastered how to create a custom proxy middleware in Scrapy.
Both the meta and middleware approaches yield the same results. However, this tutorial will focus on the first technique, which is more straightforward.
Proxy Authentication in Scrapy
Paid proxies often require authentication credentials, such as a username and password, which are usually part of the proxy address.
An authenticated proxy address takes the following format:
<PROXY_PROTOCOL>://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_ADDRESS>:<PROXY_PORT>
Adding an authenticated proxy requires the same approach as the unauthenticated ones. You only need to add the proxy address containing your credentials as a meta parameter, as shown:
# pip3 install scrapy
import scrapy
class ScraperSpider(scrapy.Spider):
name = "scraper"
allowed_domains = ["httpbin.io"]
start_urls = ["https://httpbin.io/ip"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url=url,
callback=self.parse,
meta={
"proxy": "http://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>"
},
)
def parse(self, response):
print(response.text)
That was easy! Your Scrapy spider can now use an authenticated proxy.
How to Use Rotating Proxies With Scrapy
Websites often flag excessive requests from a single IP address as suspicious and may block or ban it. So, you can still get blocked if you stick to a single proxy, especially for multiple requests.
You can avoid IP bans by distributing traffic over several IPs using proxy rotation. Proxy rotation lets you change your IP address per request, so the website treats each request as a different user.
This technique is handy for bypassing detection methods like Cloudflare's rate limiting during large-scale scraping.
To rotate proxies in Scrapy, you'll use the scrapy-rotating-proxy
third-party middleware. First, install the package using pip
:
pip3 install scrapy-rotating-proxies
Grab more free proxies from the previous website (Free Proxy List). Create a new rotating_proxy_list.txt
file in your project root folder (at the same level as scrapy.cfg
) and list your proxy addresses in that file:
http://23.247.137.142:80
http://91.92.155.207:3128
http://8.215.108.194:7777
http://34.199.10.221:8081
# ...
Enable the middleware by adding it to the DOWNLOADER_MIDDLEWARE
settings in the settings.py
file. Since this middleware handles single and multiple proxies, you can replace the previous custom middleware with it. Finally, specify the proxy_list.txt
path:
# ...
DOWNLOADER_MIDDLEWARES = {
"rotating_proxies.middlewares.RotatingProxyMiddleware": 350,
# ...
}
# specify the rotating proxy list path
ROTATING_PROXY_LIST_PATH = "<PATH_TO>/proxy_list.txt"
Now, send a request to HTTPBin without the meta parameter:
# pip3 install scrapy scrapy-rotating-proxies
import scrapy
class ScraperSpider(scrapy.Spider):
name = "scraper"
allowed_domains = ["httpbin.io"]
start_urls = ["https://httpbin.io/ip"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url=url,
callback=self.parse,
)
def parse(self, response):
print(response.text)
The spider will now use random proxies from rotating_proxy_list.txt
. Here's a sample result for three consecutive requests:
# request 1
{
"origin": "91.92.155.207:77628"
}
# request 2
{
"origin": "34.199.10.221:36563"
}
# request 3
{
"origin": "23.247.137.142:45731"
}
Congratulations! You now know how to rotate proxies in Scrapy to avoid getting blocked while scraping.
However, free proxies are only suitable for testing and not for real-life projects. Premium proxies, on the other hand, offer high uptime, and most services have an automatic proxy rotation feature.
Premium Proxy to Avoid Getting Blocked
Free proxies present significant challenges like rapid blocking, unstable performance, low IP reputation and security concerns, making them unsuitable for professional web scraping operations.
A premium proxy solution delivers reliable protection against blocking and detection. Using services with automated IP rotation and geo-targeting capabilities can dramatically improve your scraping effectiveness.
ZenRows' Residential Proxies, the best premium proxy service, offers a residential proxy network featuring over 55M+ IPs spanning 185+ countries. It's the best solution to power reliable scraping operations with features like dynamic IP rotation, intelligent proxy selection, geo-targeting, enterprise-grade uptime, and more.
Let's integrate ZenRows' Residential Proxies with Scrapy.
First, sign up for an account and access the Proxy Generator dashboard.

This will provide your essential credentials (username, password) and proxy server details (proxy domain and proxy port). Replace the placeholders with your generated credentials:
import scrapy
class ScraperSpider(scrapy.Spider):
name = "scraper"
start_urls = ["https://httpbin.io/ip"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url=url,
callback=self.parse,
meta={"proxy": "http://<ZENROWS_PROXY_USERNAME>:<ZENROWS_PROXY_PASSWORD>@superproxy.zenrows.com:1337"},
)
def parse(self, response):
self.logger.info(f"{response.text}")
When you run this spider multiple times, you'll see output similar to this:
# request 1
{
"origin": "69.244.164.205:37774"
}
# request 2
{
"origin": "77.242.86.124:53890"
}
Excellent! The results show your Scrapy requests are being routed through the ZenRows proxy network.
Each request displays a unique IP address, confirming that the automatic rotation system functions appropriately. Your spider is now protected by high-quality proxies that significantly reduce the risk of detection.
Conclusion
For any data extraction project, you'll need to get around detection mechanisms, and a Scrapy proxy plays a key role. By routing your requests through it, you can hide your IP address and avoid getting blocked.
Now, you know how to set proxies in Scrapy. However, as free proxies are often unreliable, you should consider a reliable solution like ZenRows. Try ZenRows for free!