Error 403 in Web Scraping: 7 Easy Solutions

Idowu Omisola
Idowu Omisola
September 26, 2024 · 7 min read

The 403 web scraping error code is a common HTTP response for an unfulfilled request. It's often returned when a Cloudflare-protected website recognizes your traffic as automated and denies access to the content.

This is what the error might look like in your terminal:

Output
HTTPError: 403 Client Error: Forbidden for url: https://www.scrapingcourse.com/cloudflare-challenge

Fortunately, you can overcome the 403 Forbidden error with seven actionable techniques.

  1. Scraping API to bypass error 403 in web scraping.
  2. Set a fake User Agent.
  3. Complete your headers.
  4. Avoid IP bans.
  5. Make requests through proxies.
  6. Use a headless browser.
  7. Use anti-Cloudflare plugins.

Keep reading to see how to implement them.

1. Scraping API to Bypass Error 403 in Python (or Any Language)

Web scraping APIs are the most effective solution for bypassing the 403 Forbidden error in web scraping. Modern anti-bot systems employ increasingly sophisticated protection measures, including IP detection, browser fingerprinting, behavioral analysis, and more.

These defenses are designed to identify and block automated traffic, which makes it extremely challenging to bypass them using open-source solutions. Web scraping APIs automatically work against all these protection measures, handling the complexities of mimicking human-like behavior at scale.

Solutions like ZenRows web scraping API provide all the features necessary to imitate natural user behavior, including JavaScript rendering, dynamic IP rotation, browser fingerprinting, advanced anti-bot bypass, and more. This allows you to avoid the 403 Forbidden error and scrape without getting blocked.

Let's see how ZenRows uses a Cloudflare-protected page as the target URL.

Sign up for free, and you'll get to the Request Builder page:

building a scraper with zenrows
Click to open the image in full screen

Input the target URL (in this case,https://www.scrapingcourse.com/cloudflare-challenge), then activate the Premium Proxies and JS Rendering boost mode.

Select your preferred programming language (in this case, Python) and click on the API tab. That'll generate your request code. 

Copy-paste the generated code and install the Python Requests library using the following command:

Terminal
pip3 install requests

Your code should look like this:

Example
import requests

url = "https://www.scrapingcourse.com/cloudflare-challenge"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
    "url": url,
    "apikey": apikey,
    "js_render": "true",
    "premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)

Run it, and you'll get the following result.

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Cloudflare Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Cloudflare challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Awesome, right? ZenRows makes bypassing the 403 Forbidden web scraping error in Python or any other language easy.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

2. Set a Fake User Agent

Another way to bypass the 403 Forbidden web scraping error is by setting up a fake User Agent. It's a string sent by web clients with every request to identify themselves to the web server.

Non-browser web clients have unique User Agents that servers use to detect and block them. For example, here's what a Python Requests' User Agent looks like:

Example
python-requests/2.32.3

And here's a Chrome User Agent:

Example
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36

The examples above show how easy it is for websites to differentiate between the two.

Yet, you can manipulate your User Agent (UA) string to appear like that of Chrome or any browser. In Python Requests, just pass the fake User Agent as part of the headers parameters in your request.

Creating a working UA string can get complex, so check out our list of best web scraping User Agents you can use.

Let's see how to set a User Agent in Python by adding the new UA in the headers object, which is used to make the request:

Example
import requests

url = "https://httpbin.io/user-agent"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",
}

# set headers
response = requests.get(url, headers=headers)


print(response.text)

You'll get the following output:

Output
{
  "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36"
}

For the best results, you must randomize UAs. Read our tutorial on User Agents in Python to learn how to do that.

It's important to note that more than just setting a UA may be required, as websites can identify patterns to block your scraper.

3. Complete Your Headers

When making requests with web clients like Python Requests and Selenium, the default headers don't include all the regular data that websites expect to be included in a user's request. That'll make you stand out as a suspicious request, leading to a potential 403 web scraping error triggered by WAFs like Imperva.

For example, these are the Python Requests' default headers:

Output
{
  "headers": {
    "Accept": [
      "*/*"
    ],
    "Accept-Encoding": [
      "gzip, deflate"
    ],
    "Connection": [
      "keep-alive"
    ],
    "Host": [
      "httpbin.io"
    ],
    "User-Agent": [
      "python-requests/2.32.3"
    ]
  }
}

And these are the ones of a regular web browser:

Example
headers = {
    'authority': 'www.google.com',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'max-age=0',
    'cookie': 'SID=ZAjX93QUU1NMI2Ztt_dmL9YRSRW84IvHQwRrSe1lYhIZncwY4QYs0J60X1WvNumDBjmqCA.; __Secure- 
    #..,
    'sec-ch-ua': '"Not/A)Brand";v="99", "Google Chrome";v="115", "Chromium";v="115"',
    'sec-ch-ua-arch': '"x86"',
    'sec-ch-ua-bitness': '"64"',
    'sec-ch-ua-full-version': '"115.0.5790.110"',
    'sec-ch-ua-full-version-list': '"Not/A)Brand";v="99.0.0.0", "Google Chrome";v="115.0.5790.110", "Chromium";v="115.0.5790.110"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-model': '""',
    'sec-ch-ua-platform': 'Windows',
    'sec-ch-ua-platform-version': '15.0.0',
    'sec-ch-ua-wow64': '?0',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36',
    'x-client-data': '#..',
}

The difference is clear. But you can change your headers to look like a regular browser using Python Requests. To do that, define browser headers in the headers object, which is used to make the request, just as you did with User Agents.

Example
import requests

url = "https://httpbin.io/headers"

headers = {
    "authority": "www.google.com",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "accept-language": "en-US,en;q=0.9",
    "cache-control": "max-age=0",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",
    # add more headers as needed
}

# set headers
response = requests.get(url, headers=headers)

# print response
print(response.text)

You'll get the following output on running the code:

Output
{
  "headers": {
    "Accept": [
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"
    ],
    "Accept-Encoding": [
      "gzip, deflate"
    ],
    "Accept-Language": [
      "en-US,en;q=0.9"
    ],
    "Authority": [
      "www.google.com"
    ],
    "Cache-Control": [
      "max-age=0"
    ],
    "Connection": [
      "keep-alive"
    ],
    "Host": [
      "httpbin.io"
    ],
    "User-Agent": [
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36"
    ]
  }
}

You can use your browser headers by inspecting any web page, preferably your target website, using a regular browser. Navigate to the Network tab, select any request, and copy the headers. However, for cleaner code and header structure, check out our guide on HTTP headers for web scraping.

4. Avoid IP Bans

Too many requests from the same IP address can result in IP bans. Most websites employ request rate limits to control traffic and resource usage. Therefore, exceeding a predefined limit will get you blocked.

In this case, you can prevent IP bans by implementing delays between successive requests or request-throttling (limiting the number of requests you make within a certain time frame).

To delay requests in Python, set the delay between consecutive requests, specify the total number of retry attempts, iterate through each request, and use the time.sleep() function to pause between requests.

Example
import requests
import time


# replace this with the target website URL 
url = 'https://www.example.com' 
 
headers = {
    # add your custom headers here
}
 
# define the time delay between requests (in seconds)
delay_seconds = 2
 
# number of requests you want to make 
num_requests = 10
 
for i in range(num_requests):
    response = requests.get(url, headers=headers)
    
    # print the response
    if response.status_code == 200:
        print(f"Request {i + 1} successful!")
        print(response.text)      else:
        print(f'Request {i + 1} failed with status code: {response.status_code}')
    
    # introduce a delay between requests
    time.sleep(delay_seconds)

However, this can only help with a few requests. So, while understanding the concept of rate limiting and IP bans is critical in avoiding the 403 web scraping error response, you also need to implement proxies.

5. Make Requests through Proxies

A more solid technique to avoid IP bans is routing requests through proxies. Proxies act as intermediaries between you and the target server, allowing you to scrape using multiple IP addresses, thereby eliminating the risk of getting blocked due to excessive requests.

The most common proxy types for web scraping are datacenter and residential. The first ones refer to IP addresses provided by data centers, which are easily detected by websites. On the other hand, residential proxies are real IP addresses of home devices and are generally more reliable and difficult for websites to detect.

Check out our guide on the best web scraping proxies to learn more.

To use proxies in Python Requests, add your proxy details to the requests.get() function using the proxy parameter. You can grab some free proxies from Free Proxy List, yet that works for testing only, as you need premium proxies in a real environment.

Example
import requests

# replace this with the target website URL
url = "https://httpbin.io/ip"

headers = {
    # add your custom headers here
}

# define the proxy you want to use
proxy = {
    "http": "http://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>",
    "https": "http://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>",
}

response = requests.get(url, headers=headers, proxies=proxy)

# print the response
print(response.text)

It's important to note that free proxies are generally unreliable for web scraping. Moreover, free proxies often have slow connection speeds, frequent downtime and would be blocked by most websites. For best results, you should use residential proxies with Python Requests.

An excellent option is to use a residential proxy service like the one offered by ZenRows. ZenRows' residential proxies provide several advantages for web scraping, including high-quality IPs, autorotation, geolocation access, and more.

However, like with User Agents, proxies alone may not be enough, so head to the next tip below.

6. Use a Headless Browser

Rendering JavaScript is critical for web scraping of modern websites, especially for single-page applications (SPAs) that rely heavily on client-side rendering to display content. Moreover, anti-bot measures often throw JavaScript challenges to confirm whether the requester is using a browser like a human or is a bot.

Attempting to scrape web pages that depend on JavaScript with HTTP request libraries like Python Requests will lead to incomplete data or the 403 web scraping error.

Fortunately, headless browsers like Selenium allow you to render JavaScript and ultimately solve said challenges. Find out more in our guide on web scraping using Selenium with Python.

7. Use Anti-Cloudflare Plugins

As mentioned earlier, Cloudflare-protected websites are the main ones responsible for the 403 Forbidden error that developers get while web scraping. Cloudflare acts as a reverse proxy between you and the target web server, granting and keeping access only to human-like requests.

Using anti-Cloudflare plugins may be enough to bypass Cloudflare in some cases. Some of those include:

  • Undetected ChromeDriver: a Selenium plugin that works by shielding Selenium from bot detection mechanisms. 
  • Cloudscraper: This tool works like the Python Requests library but uses JS engines to solve Cloudflare's JavaScript challenges.

While these plugins can be useful, it's important to understand their limitations. The primary challenge with anti-Cloudflare plugins is that they often struggle to keep pace with Cloudflare's frequent updates and evolving anti-bot measures.

Cloudflare continuously refines its detection methods, which makes it increasingly difficult for these plugins to remain effective over time. As a result, solutions that work today may become obsolete tomorrow, thus demanding constant updates and maintenance.

Conclusion

While free solutions exist, they require complex setups and frequent updates and may still fail. For a more reliable approach, consider using a web scraping API like ZenRows. It automatically handles anti-bot measures, proxy rotation, and other technical hurdles, allowing you to focus on data extraction rather than access issues.

This approach not only saves time and resources but also provides a much higher success rate in your web scraping projects, especially when dealing with sophisticated websites that employ advanced protection mechanisms. Try ZenRows for free!

Frequent Questions

What Is 403 Forbidden while Scraping?

403 Forbidden while scraping is an error response code that means the web server detects your scraping activities and denies you access. That happens because anti-bot systems flag your requests as automated and block them.

What Is Error 403 in Python Scraping?

Error 403 in Python Scraping refers to the HTTP status code Forbidden error that arises when a web server denies your request. Encountering this error when scraping using Python is common due to unique Python libraries' signatures and fingerprints, which are easily flagged by anti-bot measures. For example, its default User Agent and incomplete headers often identify themselves to the web server as bots.

How Do I Catch a 403 Error?

To catch a 403 error when scraping with Python, use the try and except blocks. When making a request, first include the try and except framework. Then, include a code block within try and, in the case of a 403 error, the corresponding except block will catch the error, allowing you to implement the bypass techniques.

How Do I Bypass the 403 Error in Web Scraping?

To bypass the 403 error in web scraping, you must emulate natural user behavior because the error is mainly due to anti-bot measures. You can use a web scraping API like ZenRows to abstract the process of mimicking an actual browser while you focus on extracting the necessary data.

Can You Bypass a 403 Error?

Yes, you can bypass a 403 error by implementing techniques and/or tools that make you appear like a regular user. Strategies like headless browsers, rotating proxies, and User Agents, or using a web scraping API like ZenRows can help you solve a 403 error.

How Do I Get Past 403 Forbidden in Python?

You can get past 403 Forbidden in Python by employing methods to mimic an actual browser request. Techniques such as using headless browsers, rotating premium residential proxies, and completing HTTP headers can help you achieve that.

Why Am I Getting a 403 Error When Web Scraping?

A 403 error during web scraping typically occurs for two main reasons. First, the website may require specific permissions to access certain content. Your scraper doesn't have these permissions. Second, the website has likely identified your activity as automated scraping. In response, it's blocking your access. This is a common protective measure websites use to control access to their data and prevent excessive automated requests.

Ready to get started?

Up to 1,000 URLs for free are waiting for you