The Anti-bot Solution to Scrape Everything? Get Your Free API Key! 😎

Python Requests: Retry Failed Requests [2024]

August 7, 2023 · 8 min read

Although the Requests library makes it easier to make HTTP requests in Python, getting failed requests is frequent due to a network connection issue or other reasons. Therefore, this tutorial introduces the different causes and teaches you how to create a Python Requests retry script to attempt your requests again.

 The two main methods we'll cover are:

What to Know to Build the Python Requests' Retry Logic

Should retries be attempted in all cases or only in specific scenarios? When is the appropriate time to retry, and how many attempts should be made?

In this section, we'll answer those questions and provide code examples to help you build a web crawler Python Requests retry logic. 

Types of Failed Requests

Understanding the reasons behind a failed request will allow you to develop strategies to deal with each case. Essentially, we can talk of requests that timed out (there's no response from the server) and requests that returned an error.

Let's see each one.

Timed out

A request may time out, resulting in no response from the server. That can happen for several reasons, such as overloaded servers, problems with how the server responds, or slow network connections.

When faced with timeout scenarios, consider checking your internet connection, as a stable connection may suggest the problem is server related.

You can catch exceptions related to timeouts, such as requests.Timeout, and implement a Python retry mechanism conditionally or with strategies like exponential backoff. We'll look at these later on.

Returned an Error

When a request is unsuccessful, it'll most often return an error response, which typically comes with a specific status code and an error message. The first tells what went wrong, and the second includes additional information that can provide insights into the actual problem. For instance:

Output
404 Not Found

Your first approach to addressing this scenario is to review both the status code and error message while ensuring that the request is properly formed. If you suspect that the error results from a temporary problem or server issues, you may retry the request with caution.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Status Codes for a Python Requests Retry Loop

The different errors in client-server communications are in the 4xx and 5xx code ranges. They include:

  • 400 Bad Request.
  • 401 Unauthorized.
  • 403 Forbidden.
  • 404 Not Found.
  • 405 Method Not Allowed.
  • 408 Request Timeout.
  • 429 Too Many Requests.
  • 500 Internal Server Error.
  • 501 Not Implemented.
  • 502 Bad Gateway.
  • 503 Service Unavailable.
  • 504 Gateway Timeout.
  • 505 HTTP Version Not Supported.

The most common ones you'll see while web scraping are:

Error Code Explanation
403 Forbidden The server understands the request but won't fulfill it because it doesn't have the right permissions or access.
429 Too Many Requests The server has received too many requests from the same IP within a given time frame, so it's rate-limiting in web scraping.
500 Internal Server Error A generic server error occurred, indicating that something went wrong on the server while processing the request.
502 Bad Gateway The server acting as a gateway or proxy received an invalid response from an upstream server.
503 Service Unavailable The server is too busy or undergoing maintenance and can't handle the request right now.
504 Gateway Timeout An upstream server didn't respond quickly enough to the gateway or proxy.

You can check out the MDN docs for more information on HTTP response status codes.

Number of Retries

Setting the number of retries for a failed request depends on several considerations, such as the type of request error and the response time. Errors like 429 Too Many Requests are temporary and should have more retries than those that aren't. 

While there's no best maximum number of retries, it's recommended to set a reasonable limit to avoid indefinite retries and potential performance issues. You can start with small values like three or five.

Delay

Delays between requests should be set to prevent websites and APIs from becoming overloaded and to maintain compliance with rate limits.

Fixed or Random Delay

A fixed delay between requests can be introduced using the time.sleep() function from the time module. And to add randomness to the delay, you can employ a combination of the time.sleep() function and the random module.

Just like the number of retries, there isn't a rule set in stone for how long the delay should be, but you can experiment with different reasonable delay values around 300ms to find an optimal balance. 

Backoff Strategy for the Delays

The backoff strategy is a commonly used technique for setting increasing delays between retries instead of fixed or random ones. Each request increases the delay by an exponential backoff factor, usually greater than one. This approach generally helps to handle temporary issues while avoiding overloading servers.

The backoff algorithm is this:

Example
backoff_factor * (2 ** (current_number_of_retries - 1))

For example, here are the delay sequences for backoff factors 2, 3, and 10:

Example
# 2
1, 2, 4, 8, 16, 32, 64, 128
 
# 3
1.5, 3, 12, 24, 48, 96, 192, 384
 
# 10
5, 10, 20, 40, 80, 160, 320, 640

Best Methods to Retry Python Requests

In this section, we'll look at the best methods to retry Python Requests. They include:

  • Use an existing retry wrapper: Python Sessions with HTTPAdapter.
  • Code your retry wrapper.

We recommend the first one, but the second one might be suitable in some scenarios.

Method 1: Use an Existing Retry Wrapper: Python Sessions with HTTPAdapter

Python Requests uses the urllib3 HTTP client under the hood. With the Python Requests' HTTP adapter class and the Retry utility class from the urllib3 package, you can set up retries in Python. The HTTPAdapter class lets you specify a retry strategy and also change the behavior of requests.

Retry on Failure

To implement the Python Requests retry logic in case of failure, start by defining options.

We set the maximum number of requests to 4 and specified that the request should only be reattempted if the error has a status code of either 429, 500, 502, 503, or 504.

program.py
# Define the retry strategy
retry_strategy = Retry(
    total=4,  # Maximum number of retries
    status_forcelist=[429, 500, 502, 503, 504],  # HTTP status codes to retry on
)

The retry strategy is passed to the HTTPAdapter when creating a new adapter object. The adapter is then mounted to a session object, which is used to make all requests.

program.py
import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry
 
# Define the retry strategy
retry_strategy = Retry(
    total=4,  # Maximum number of retries
    status_forcelist=[429, 500, 502, 503, 504],  # HTTP status codes to retry on
)
# Create an HTTP adapter with the retry strategy and mount it to session
adapter = HTTPAdapter(max_retries=retry_strategy)
 
# Create a new session object
session = requests.Session()
session.mount('http://', adapter)
session.mount('https://', adapter)
 
# Make a request using the session object
response = session.get('https://scrapingcourse.com/ecommerce/')
 
if response.status_code == 200:
    print(f'SUCCESS: {response.text}')
else:
    print("FAILED")

Sessions and HTTPAdapter with a Backoff Strategy

To use the backoff strategy to set increasing delays between retries, add the backoff_factor parameter in the retry wrapper:

program.py
# ...
# Define the retry strategy
retry_strategy = Retry(
    total=4,  # Maximum number of retries
    backoff_factor=2,  # Exponential backoff factor (e.g., 2 means 1, 2, 4, 8 seconds, ...)
    status_forcelist=[429, 500, 502, 503, 504],  # HTTP status codes to retry on
)
# ...

Method 2: Code Your Retry Wrapper

Unlike in the previous option, we'll create a custom wrapper for the retry logic now. That way, you'll have the flexibility of implementing a custom error handler, logging, and more.

Python Requests: Retry on Failure

To keep it easy, let's create the Python function retry_request to simulate the retry logic implementation of method 1.

Inside, it takes in the target URL as its first argument, then total for the number of retries, and status_forcelist to specify the type of errors for which to retry the request.

program.py
import requests
 
def retry_request(url, total=4, status_forcelist=[429, 500, 502, 503, 504], **kwargs):
    # Make number of requests required
    for _ in range(total):
        try:
            response = requests.get(url, **kwargs)
            if response.status_code in status_forcelist:
                # Retry request 
                continue
            return response
        except requests.exceptions.ConnectionError:
            pass
    return None
 
response = retry_request('https://scrapingcourse.com/ecommerce/')
print(response.text)

Retry Python Requests with a Backoff Strategy

To retry Python Requests with a backoff strategy, take the previous code as a base. Then, create a separate function named backoff_delay to calculate the delay and use the time.sleep() function to make it happen like this:

program.py
def backoff_delay(backoff_factor, attempts):
    # backoff algorithm
    delay = backoff_factor * (2 ** attempts)
    return delay

Using the backoff_delay function, you'll have the following:

Avoid Getting Blocked by Error 403 with Python Requests

Getting blocked because you're identified as a bot is the biggest problem when crawling. Some websites may block your IP address or take other measures to prevent you from accessing the site if you are detected as a bot. 

To prove this, let's attempt to scrape a protected page on G2.com:

program.py
import requests
 
response = requests.get('https://www.g2.com/products/notion/reviews')
print(response.text, response.status_code)

Run the code, and you'll get a response like the one below, indicating that you were blocked with a 403 error:

Output
// ...
   <div class="cf-main-wrapper" role="main">
      <div class="cf-header cf-section">
         <div class="cf-error-title">
            <h1>Access denied</h1>
            <span class="cf-code-label">Error code <span>1020</span></span>
         </div>
         <div class="cf-error-description">
            <p>You do not have access to www.g2.com.</p><p>The site owner may have set restrictions that prevent you from accessing the site.</p>
         </div>
      </div>
   </div>
 
// ...
 
403

In a scenario like the one we just encountered, web scraping proxies may be considered a solution. However, proxies aren't enough, and specific libraries and customizations are unlikely to keep up with the evolving anti-bot systems. That's why many developers use a web scraping API as a dedicated toolkit to avoid getting blocked.

ZenRows is a popular web scraping API that ships with features such as premium proxies, rotating headers, and much more. And it can be integrated with Python Requests. Let's see it in action!

Sign up on ZenRows to get your free API key, which you'll use to scrape through API requests. You'll get to the Request Builder, where you should activate the Anti-bot mode with premium proxies as a recommended configuration.

A screenshot of ZenRows dashboard
Click to open the image in full screen

You'll get your scraper code on the right.

program.py
import requests
 
url = 'https://www.g2.com/products/notion/reviews'
proxy = "http://<YOUR_ZENROWS_API_KEY>:js_render=true&antibot=true&[email protected]:8001'
proxies = {'http': proxy, 'https': proxy}
response = requests.get(url, proxies=proxies, verify=False)
print(response.text)

Replace <YOUR_ZENROWS_API_KEY> with the API key from the dashboard and run the code. You'll get a response similar to the one below:

Notion Review Response Screenshot
Click to open the image in full screen

Great! You've been able to scrape the reviews Notion page with the help of ZenRows.

Best Practice: Retry Python Requests with a Decorator

Using a decorator to implement retries is a cleaner approach, as the Python Requests retry logic contained in the decorator can be easily applied to multiple methods or functions.

Instead of implementing the decorator yourself, you can use Tenacity, a community-maintained package that simplifies the process of adding retry behavior to requests.

Start by installing Tenacity:

Terminal
pip install tenacity

Italic The retry decorator from Tenacity takes in arguments like stop for the maximum number of retries and wait for details, among others.

program.py
# Define the retry decorator
@retry(
    stop=stop_after_attempt(4), # Maximum number of retries
    wait=wait_exponential(multiplier=1, min=1, max=60) # Exponential backoff
)

Here you have it implemented in a scraper:

program.py
import requests
from tenacity import retry, stop_after_attempt, wait_exponential
 
# Define the retry decorator
@retry(
    stop=stop_after_attempt(4), # Maximum number of retries
    wait=wait_exponential(multiplier=1, min=1, max=60) # Exponential backoff
)
def make_request():
    return requests.get('https://scrapingcourse.com/ecommerce/')
 
try:
    response = make_request()
    print(f'SUCCESS: {response.text}')
except requests.RequestException as e:
    print(f'FAILED: {e}')

POST Retry with Python Requests

In addition to the GET method request we've been using, other methods can be retried, such as POST for creating new resources on the server and PUT for updating existing resources. For example, to submit a form.

You can use Tenacity to make a POST request by replacing requests.get with requests.post. Check out line 10:

program.py
import requests
from tenacity import retry, stop_after_attempt, wait_exponential
 
# Define the retry decorator
@retry(
    stop=stop_after_attempt(4), # Maximum number of retries
    wait=wait_exponential(multiplier=1, min=1, max=60) # Exponential backoff
)
def make_request(data):
    return requests.post('https://scrapingcourse.com/ecommerce/')
 
try:
    response = make_request({ 'key': 'value' })
    print(f'SUCCESS: {response.text}')
except requests.RequestException as e:
    print(f'FAILED: {e}')

Conclusion

Handling failed requests is critical to building a robust and reliable web scraper. In this tutorial, we looked into the importance of retrying failed requests and what to know to code them. Now you know:

  • The most important Python Requests retry logic considerations.
  • The two best options for retries.
  • How to retry requests with different HTTP methods.

One of the biggest challenges is getting access denied because you're detected as a bot. To overcome that barrier, a popular web scraping API like ZenRows will help prevent you from getting blocked and save you tons of time and effort against anti-bot measures. Try it for free now!

Frequent Questions

How Do You Retry a Request in Python?

You can retry a request in Python by either using the existing wrapper from Requests or creating a custom wrapper with loops and exception handling in order to implement a retry mechanism.

How Do You Force Keep-Alive in Python Requests?

You can force Keep-Alive in Python Requests by using the Session object and setting the Connection header to keep-alive. That lets the underlying TCP connection be used for other requests to the same server, which improves performance when making multiple requests to the same endpoint.

# Create a session object
session = requests.Session()
 
# Set keep-alive for all requests made through this session
session.headers.update({'Connection': 'keep-alive'})

How Do You Handle Timeouts in Python Requests?

You can handle timeouts using the timeout parameter, which specifies the maximum amount of time (in seconds) that the request should wait for a response before raising a timeout exception.

# pip install requests
import requests
 
response = requests.get('https://example.com', timeout=15)
print(response.text)

What is Error Code 404 in Python Requests?

Error code 404 is the same as the HTTP status Not Found, which means that the server couldn't find the requested resource.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.