Although the Requests library makes it easier to make HTTP requests in Python, getting failed requests is frequent due to a network connection issue or other reasons. Therefore, this tutorial introduces the different causes and teaches you how to create a Python Requests retry script to attempt your requests again.
ย The two main methods we'll cover are:
What to Know to Build the Python Requests' Retry Logic
Should retries be attempted in all cases or only in specific scenarios? When is the appropriate time to retry, and how many attempts should be made?
In this section, we'll answer those questions and provide code examples to help you build a web crawler Python Requests retry logic.ย
Types of Failed Requests
Understanding the reasons behind a failed request will allow you to develop strategies to deal with each case. Essentially, we can talk of requests that timed out (there's no response from the server) and requests that returned an error.
Let's see each one.
Timed out
A request may time out, resulting in no response from the server. That can happen for several reasons, such as overloaded servers, problems with how the server responds, or slow network connections.
When faced with timeout scenarios, consider checking your internet connection, as a stable connection may suggest the problem is server related.
You can catch exceptions related to timeouts, such as requests.Timeout
, and implement a Python retry mechanism conditionally or with strategies like exponential backoff. We'll look at these later on.
Returned an Error
When a request is unsuccessful, it'll most often return an error response, which typically comes with a specific status code and an error message. The first tells what went wrong, and the second includes additional information that can provide insights into the actual problem. For instance:
404 Not Found
Your first approach to addressing this scenario is to review both the status code and error message while ensuring that the request is properly formed. If you suspect that the error results from a temporary problem or server issues, you may retry the request with caution.
Status Codes for a Python Requests Retry Loop
The different errors in client-server communications are in the 4xx and 5xx code ranges. They include:
- 400 Bad Request.
- 401 Unauthorized.
- 403 Forbidden.
- 404 Not Found.
- 405 Method Not Allowed.
- 408 Request Timeout.
- 429 Too Many Requests.
- 500 Internal Server Error.
- 501 Not Implemented.
- 502 Bad Gateway.
- 503 Service Unavailable.
- 504 Gateway Timeout.
- 505 HTTP Version Not Supported.
The most common ones you'll see while web scraping are:
Error Code | Explanation |
---|---|
403 Forbidden | The server understands the request but won't fulfill it because it doesn't have the right permissions or access. |
429 Too Many Requests | The server has received too many requests from the same IP within a given time frame, so it's rate-limiting in web scraping. |
500 Internal Server Error | A generic server error occurred, indicating that something went wrong on the server while processing the request. |
502 Bad Gateway | The server acting as a gateway or proxy received an invalid response from an upstream server. |
503 Service Unavailable | The server is too busy or undergoing maintenance and can't handle the request right now. |
504 Gateway Timeout | An upstream server didn't respond quickly enough to the gateway or proxy. |
You can check out the MDN docs for more information on HTTP response status codes.
Number of Retries
Setting the number of retries for a failed request depends on several considerations, such as the type of request error and the response time. Errors like 429 Too Many Requests
are temporary and should have more retries than those that aren't.ย
While there's no best maximum number of retries, it's recommended to set a reasonable limit to avoid indefinite retries and potential performance issues. You can start with small values like three or five.
Delay
Delays between requests should be set to prevent websites and APIs from becoming overloaded and to maintain compliance with rate limits.
Fixed or Random Delay
A fixed delay between requests can be introduced using the time.sleep()
function from the time module. And to add randomness to the delay, you can employ a combination of the time.sleep()
function and the random
module.
Just like the number of retries, there isn't a rule set in stone for how long the delay should be, but you can experiment with different reasonable delay values around 300ms to find an optimal balance.ย
Backoff Strategy for the Delays
The backoff strategy is a commonly used technique for setting increasing delays between retries instead of fixed or random ones. Each request increases the delay by an exponential backoff factor, usually greater than one. This approach generally helps to handle temporary issues while avoiding overloading servers.
The backoff algorithm is this:
backoff_factor * (2 ** (current_number_of_retries - 1))
For example, here are the delay sequences for backoff factors 2, 3, and 10:
# 2
1, 2, 4, 8, 16, 32, 64, 128
# 3
1.5, 3, 12, 24, 48, 96, 192, 384
# 10
5, 10, 20, 40, 80, 160, 320, 640
Best Methods to Retry Python Requests
In this section, we'll look at the best methods to retry Python Requests. They include:
- Use an existing retry wrapper: Python Sessions with HTTPAdapter.
- Code your retry wrapper.
We recommend the first one, but the second one might be suitable in some scenarios.
Method 1: Use an Existing Retry Wrapper: Python Sessions with HTTPAdapter
Python Requests uses the urllib3 HTTP client under the hood. With the Python Requests' HTTP adapter class and the Retry utility class from the urllib3
package, you can set up retries in Python. The HTTPAdapter class lets you specify a retry strategy and also change the behavior of requests.
Retry on Failure
To implement the Python Requests retry logic in case of failure, start by defining options.
We set the maximum number of requests to 4
and specified that the request should only be reattempted if the error has a status code of either 429
, 500
, 502
, 503
, or 504
.
# Define the retry strategy
retry_strategy = Retry(
total=4, # Maximum number of retries
status_forcelist=[429, 500, 502, 503, 504], # HTTP status codes to retry on
)
The retry strategy is passed to the HTTPAdapter when creating a new adapter
object. The adapter is then mounted to a session object, which is used to make all requests.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry
# Define the retry strategy
retry_strategy = Retry(
total=4, # Maximum number of retries
status_forcelist=[429, 500, 502, 503, 504], # HTTP status codes to retry on
)
# Create an HTTP adapter with the retry strategy and mount it to session
adapter = HTTPAdapter(max_retries=retry_strategy)
# Create a new session object
session = requests.Session()
session.mount('http://', adapter)
session.mount('https://', adapter)
# Make a request using the session object
response = session.get('https://scrapingcourse.com/ecommerce/')
if response.status_code == 200:
print(f'SUCCESS: {response.text}')
else:
print("FAILED")
Sessions and HTTPAdapter with a Backoff Strategy
To use the backoff strategy to set increasing delays between retries, add the backoff_factor
parameter in the retry wrapper:
# ...
# Define the retry strategy
retry_strategy = Retry(
total=4, # Maximum number of retries
backoff_factor=2, # Exponential backoff factor (e.g., 2 means 1, 2, 4, 8 seconds, ...)
status_forcelist=[429, 500, 502, 503, 504], # HTTP status codes to retry on
)
# ...
Method 2: Code Your Retry Wrapper
Unlike in the previous option, we'll create a custom wrapper for the retry logic now. That way, you'll have the flexibility of implementing a custom error handler, logging, and more.
Python Requests: Retry on Failure
To keep it easy, let's create the Python function retry_request
to simulate the retry logic implementation of method 1.
Inside, it takes in the target URL as its first argument, then total
for the number of retries, and status_forcelist
to specify the type of errors for which to retry the request.
import requests
def retry_request(url, total=4, status_forcelist=[429, 500, 502, 503, 504], **kwargs):
# Make number of requests required
for _ in range(total):
try:
response = requests.get(url, **kwargs)
if response.status_code in status_forcelist:
# Retry request
continue
return response
except requests.exceptions.ConnectionError:
pass
return None
response = retry_request('https://scrapingcourse.com/ecommerce/')
print(response.text)
Retry Python Requests with a Backoff Strategy
To retry Python Requests with a backoff strategy, take the previous code as a base. Then, create a separate function named backoff_delay
to calculate the delay and use the time.sleep()
function to make it happen like this:
def backoff_delay(backoff_factor, attempts):
# backoff algorithm
delay = backoff_factor * (2 ** attempts)
return delay
Using the backoff_delay
function, you'll have the following:
Avoid Getting Blocked by Error 403 with Python Requests
Getting blocked because you're identified as a bot is the biggest problem when crawling. Some websites may block your IP address or take other measures to prevent you from accessing the site if you are detected as a bot.ย
To prove this, let's attempt to scrape a protected page on G2.com:
import requests
response = requests.get('https://www.g2.com/products/notion/reviews')
print(response.text, response.status_code)
Run the code, and you'll get a response like the one below, indicating that you were blocked with a 403 error:
// ...
<div class="cf-main-wrapper" role="main">
<div class="cf-header cf-section">
<div class="cf-error-title">
<h1>Access denied</h1>
<span class="cf-code-label">Error code <span>1020</span></span>
</div>
<div class="cf-error-description">
<p>You do not have access to www.g2.com.</p><p>The site owner may have set restrictions that prevent you from accessing the site.</p>
</div>
</div>
</div>
// ...
403
In a scenario like the one we just encountered, web scraping proxies may be considered a solution. However, proxies aren't enough, and specific libraries and customizations are unlikely to keep up with the evolving anti-bot systems. That's why many developers use a web scraping API as a dedicated toolkit to avoid getting blocked.
ZenRows is a popular web scraping API that ships with features such as premium proxies, rotating headers, and much more. And it can be integrated with Python Requests. Let's see it in action!
Sign up on ZenRows to get your free API key, which you'll use to scrape through API requests. You'll get to the Request Builder, where you should activate the Premium Proxies and JS Rendering boost mode as a recommended configuration. Then, click on the Python tab.
You'll get your scraper code on the right.
import requests
url = 'https://www.g2.com/products/notion/reviews'
proxy = "http://<YOUR_ZENROWS_API_KEY>:js_render=true&[email protected]:8001'
proxies = {'http': proxy, 'https': proxy}
response = requests.get(url, proxies=proxies, verify=False)
print(response.text)
Replace <YOUR_ZENROWS_API_KEY>
with the API key from the dashboard and run the code. You'll get a response similar to the one below:
Great! You've been able to scrape the reviews Notion page with the help of ZenRows.
Best Practice: Retry Python Requests with a Decorator
Using a decorator to implement retries is a cleaner approach, as the Python Requests retry logic contained in the decorator can be easily applied to multiple methods or functions.
Instead of implementing the decorator yourself, you can use Tenacity, a community-maintained package that simplifies the process of adding retry behavior to requests.
Start by installing Tenacity:
pip install tenacity
Italic
The retry
decorator from Tenacity takes in arguments like stop
for the maximum number of retries and wait
for details, among others.
# Define the retry decorator
@retry(
stop=stop_after_attempt(4), # Maximum number of retries
wait=wait_exponential(multiplier=1, min=1, max=60) # Exponential backoff
)
Here you have it implemented in a scraper:
import requests
from tenacity import retry, stop_after_attempt, wait_exponential
# Define the retry decorator
@retry(
stop=stop_after_attempt(4), # Maximum number of retries
wait=wait_exponential(multiplier=1, min=1, max=60) # Exponential backoff
)
def make_request():
return requests.get('https://scrapingcourse.com/ecommerce/')
try:
response = make_request()
print(f'SUCCESS: {response.text}')
except requests.RequestException as e:
print(f'FAILED: {e}')
POST Retry with Python Requests
In addition to the GET
method request we've been using, other methods can be retried, such as POST
for creating new resources on the server and PUT
for updating existing resources. For example, to submit a form.
You can use Tenacity to make a POST
request by replacing requests.get
with requests.post
. Check out line 10:
import requests
from tenacity import retry, stop_after_attempt, wait_exponential
# Define the retry decorator
@retry(
stop=stop_after_attempt(4), # Maximum number of retries
wait=wait_exponential(multiplier=1, min=1, max=60) # Exponential backoff
)
def make_request(data):
return requests.post('https://scrapingcourse.com/ecommerce/')
try:
response = make_request({ 'key': 'value' })
print(f'SUCCESS: {response.text}')
except requests.RequestException as e:
print(f'FAILED: {e}')
Conclusion
Handling failed requests is critical to building a robust and reliable web scraper. In this tutorial, we looked into the importance of retrying failed requests and what to know to code them. Now you know:
- The most important Python Requests retry logic considerations.
- The two best options for retries.
- How to retry requests with different HTTP methods.
One of the biggest challenges is getting access denied because you're detected as a bot. To overcome that barrier, a popular web scraping API like ZenRows will help prevent you from getting blocked and save you tons of time and effort against anti-bot measures. Try it for free now!
Frequent Questions
How Do You Retry a Request in Python?
You can retry a request in Python by either using the existing wrapper from Requests or creating a custom wrapper with loops and exception handling in order to implement a retry mechanism.
How Do You Force Keep-Alive in Python Requests?
You can force Keep-Alive in Python Requests by using the Session
object and setting the Connection header to keep-alive
. That lets the underlying TCP connection be used for other requests to the same server, which improves performance when making multiple requests to the same endpoint.
# Create a session object
session = requests.Session()
# Set keep-alive for all requests made through this session
session.headers.update({'Connection': 'keep-alive'})
How Do You Handle Timeouts in Python Requests?
You can handle timeouts using the timeout
parameter, which specifies the maximum amount of time (in seconds) that the request should wait for a response before raising a timeout exception.
# pip install requests
import requests
response = requests.get('https://example.com', timeout=15)
print(response.text)
What is Error Code 404 in Python Requests?
Error code 404 is the same as the HTTP status Not Found
, which means that the server couldn't find the requested resource.