Are you scraping a DataDome-protected website but getting blocked by a 403 forbidden error? You've come to the right place to fix it!
This article explains the meaning of the DataDome 403 error and provides the four best ways to fix it during web scraping.
What Is DataDome 403?
The error 403 is common in web scraping. It means the server understands your request but forbids it. So, a DataDome 403 error occurs when the DataDome security system blocks your access to a website during web scraping.
With a DataDome-protected website like Best Western, the error might look like this in your terminal:
Failed to load the page. Status: 403 forbidden for https://www.bestwestern.com/
You'll see the following message if you take a screenshot of that web page:
The only way to solve this error is to bypass DataDome.
Solutions to Fix 403 Error With DataDome
The DataDome 403 forbidden error will deny you access to your target data. In this section, we will examine four ways to solve it, starting with the best and easiest method.
1. Use a Web Scraping API
A web scraping API is a solution for a hassle-free anti-boty bypass. An example of a web scraping API is ZenRows, an all-in-one scraping solution that provides auto-rotating premium proxies, configures your request headers, and bypasses DataDome CAPTCHAs and other anti-bot systems at scale.
ZenRows is a game-changer because it works with any programming language and acts as a headless browser featuring JavaScript instructions for scraping dynamic web pages like those using infinite scrolling.
Let's use ZenRows to scrape the previous DataDome-protected website (Best Western) to see how it works.
Sign up to open the ZenRows Request Builder. Paste the target page URL in the link box, toggle on the Boost mode to JS Rendering, and activate Premium Proxies. Select Python as your programming language and choose the API connection mode. Copy and paste the generated code into your Python script:
Here's the generated code with slight modification:
# pip install requests
import requests
# define your request parameters
params = {
"url": "https://www.bestwestern.com/",
"apikey": "<YOUR_ZENROWS_API_KEY>",
"js_render": "true",
"premium_proxy": "true",
}
# send your request and get the response
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)
The above code extracts the full-page HTML of the protected website. See the result below, showing the page title with some omitted content:
<html lang="en-us">
<head>
<title>Best Western Hotels - Book Online For The Lowest Rate</title>
</head>
<body class="bestWesternContent bwhr-brand">
<header>
<!-- ... -->
</header>
<!-- ... -->
</body>
</html>
Congratulations! You just scraped a DataDome-protected website with ZenRows and the Requests library. If you're willing to get busy with technical configurations, keep reading to learn other methods.
2. Make Use of a Headless Browser and Anti-Bot Plugin
Headless browsers like Selenium, Puppeteer, JSDom, and Playwright let you execute JavaScript, which can boost your likelihood of passing DataDome's JavaScript challenges.
However, these headless browsers still have bot-like properties, such as the automated WebDriver, which exposes them to anti-bot detection. Fortunately, some have plugins to enhance your chance of evading blocks.
Here are the most common headless browsers and their anti-bot bypass plugins:
- The Undetected Chromedriver, an optimized WebDriver for Selenium.
- Puppeteer Stealth, a Puppeteer-extra plugin with evasions for Puppeteer.
- Playwright stealth, an anti-bot bypass add-on for Playwright.
You can also back up these plugins with proxies. Let's see how that works in the next section.
3. Buy Premium Proxies
Proxies send HTTP requests on your behalf by changing your IP address, making it look like you're requesting from a different location. Adding a proxy to your scraper can prevent the DataDome 403 error that may result from IP bans.
Although you can use proxies, their short lifespan makes them unreliable. The best option is to use premium web scraping proxies that require authentication credentials like passwords and usernames. Most of these premium services offer automatic proxy rotation.
4. Change Your Headers
Request headers are key/value pairs that describe your scraper's HTTP client. They contain information about the clientโ's User Agent, platform, accepted content type, encoding, language, and more and determine how the server will respond to your request.
The HTTP clients in headless browsers often send incomplete or bot-like header parameters, exposing them to anti-bot detection. For instance, a headless browser like Puppeteer uses the following headers in headless mode:
{
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/119.0.6045.105 Safari/537.36',
'sec-ch-ua': '"HeadlessChrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"'
}
Compare that with the request headers sent by a real browser like Chrome, and youโll see that some parameters are either missing or incorrect.
First, the secure client hint user agent (sec-ch-ua) and User-Agent headers strings in Puppeteer contain โHeadlessChromeโ. The accepted content type, encoding, and language headers are also missing. All these indicate that Puppeteer is an automated browser and can result in a DataDome 403 forbidden error.
See an example typical request header set of a real Chrome browser below:
{
"headers": {
"Accept": [
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"
],
"Accept-Encoding": [
"gzip, deflate, br, zstd"
],
"Accept-Language": [
"en-US,en;q=0.9"
],
"Connection": [
"keep-alive"
],
"Host": [
"httpbin.io"
],
"Sec-Ch-Ua": [
"\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\""
],
"Sec-Ch-Ua-Mobile": [
"?0"
],
"Sec-Ch-Ua-Platform": [
"\"Windows\""
],
"Sec-Fetch-Dest": [
"document"
],
"Sec-Fetch-Mode": [
"navigate"
],
"Sec-Fetch-Site": [
"none"
],
"Sec-Fetch-User": [
"?1"
],
"Upgrade-Insecure-Requests": [
"1"
],
"User-Agent": [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
]
}
}
You need to modify your scraper's request headers to appear more human. To achieve this, you can add the missing headers and edit the ones with incorrect values. Another tip is to order your request headers to mimic a real browser's arrangement.
Check out our article on the common HTTP headers for web scraping for a full tutorial on optimizing your request headers.
Conclusion
In this article, you've learned four ways of fixing DataDome's 403 forbidden error. Bypass techniques, including headless browser plugins, premium proxies, and request header optimization, require manual configurations and are more effective when combined.
With that said, ZenRows web scraping API integration remains the ultimate and easiest method of fixing the DataDome 403 error without extra configurations. It bypasses sophisticated bots behind the scenes and lets you scrape any website without getting blocked. Try ZenRows for free!