The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ๐Ÿ˜Ž

How to Use Cloudscraper in Python & Fix Common Errors

February 6, 2023 ยท 6 min read

Cloudscraper is a Python library for bypassing the Cloudflare waiting room, also known as "I'm Under Attack Mode" (IUAM).

With the increasing cybercrime rates, Cloudflare has emerged as one of the ultimate web security solutions for protection against bots and unwanted traffic. Consequently, any data extraction project today will encounter a Cloudflare-protected website at some point.

While this antibot solution keeps a safelist for allowed bots, such as Googlebot and other search engines, web scrapers are generally identified as unwanted traffic. So, regardless of your intentions, you'll still get blocked.

In this web scraping tutorial, you'll learn how to bypass Cloudflare using Cloudscraper. We'll also discuss the common errors you may encounter and how to fix them.

Here's what you need to do in a nutshell:
  1. Import Cloudscraper and other dependencies.
  2. Create a Cloudscraper instance and define your target website.
  3. Access the website to retrieve its data.

Now, let's dive into the details.

What Is Cloudscraper?

Cloudscraper is a scraping library built exclusively for retrieving data from Cloudflare-protected websites.

Although Cloudflare frequently updates to tighten its website protection, one of its bot detection techniques is testing a client's JavaScript support.

That's why Cloudscraper uses JavaScript engines to solve JavaScript challenges and appear as a legitimate browser.

How Do You Use Cloudscraper in Python?

Cloudscraper is implemented in Requests. So, if you're familiar with this HTTP library, utilizing Cloudscraper will be effortless.

To begin, call its built-in function: create_scraper().

That will create a Cloudscraper instance, and any request sent from this session object will automatically bypass Cloudflare. If your target website isn't Cloudflare-protected, it'll be treated accordingly, with no additional configuration necessary.

Furthermore, scraper.get() is to Cloudscraper what request.get() is to Requests, as they work in a similar manner.

Prerequisites

First, you'll need Python 3. Keep in mind that some systems have it pre-installed.

After that, install Cloudscraper, Requests, and all the necessary libraries.

pip install requests cloudscraper beautifulsoup4

Can Requests Bypass Cloudflare?

Requests is arguably the most popular Python library, with over 11 million downloads. It's the de facto tool for sending HTTP requests in Python.

However, it can't help with this task.

We tried to scrape Open Sea's NFT Collection Stats, a Cloudflare-protected web page, to show you a proof.

OpenSea
Click to open the image in full screen

We sent an HTTP request to access our target website.

res = requests.get("https://opensea.io/rankings") 
print("The status code is ", res.status_code) 
print(res.text)

But we got this result:

The status code is 403 
 
<!DOCTYPE html> 
<html lang="en-US"> 
	<head> 
		<title>Access denied</title>

The result above shows a 403 status code, which is an error page. Plus, the Cloudflare system redirected us to an "Access denied" page instead of Open Sea's Collection stats page.

To get a clearer understanding, we saved the response locally to view it in a browser.

OpenSea access denied
Click to open the image in full screen

That happened because it detected our request as that of a bot and blocked us out. To bypass this Cloudflare error page, you must appear as human as possible.

Fortunately, Cloudscraper can get you there to an extent. But more on that later.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

How to Bypass Cloudflare Using Cloudscraper in Python?

A lot of work goes into bypassing Cloudflare DDoS protection.

However, with Cloudscraper, you don't need to worry about what goes on behind the scenes. Rather, you can call the scraper function and wait a few seconds to gain access.

Here's how to do it.
  1. Import Cloudscraper and other dependencies (BeautifulSoup).
from bs4 import BeautifulSoup 
import cloudscraper
  1. Create a Cloudscraper instance and define your target website.
scraper = cloudscraper.create_scraper() 
url = "https://opensea.io/rankings"
  1. Access the website to retrieve its data.
info = scraper.get(url) 
 
print(info.status_code) 
 
soup = BeautifulSoup(info.text, "html.parser") 
print(soup.find(class_ = "gCpBEX").get_text())

The code block above prints the request status code and the page title "Collection stats" elements. So, by combining the three code blocks above, you write the complete script.

from bs4 import BeautifulSoup 
import cloudscraper 
 
url = "https://opensea.io/rankings" 
scraper = cloudscraper.create_scraper() 
info = scraper.get(url) 
 
print(info.status_code) 
 
soup = BeautifulSoup(info.text, "html.parser") 
print(soup.find(class_ = "gCpBEX").get_text())

It brings the following result:

200 
Collection stats

Now, you've successfully bypassed your first Cloudflare DDoS protection.

Cloudscraper Non-Default Features

Cloudscraper has many non-default features you can pass as an argument to built-in functions, such as create_scraper(), get_tokens(), and get_cookie_string().

Some examples include:

  • Browser/user agent filtering
  • Cookies
  • CAPTCHA
  • Delays
  • JavaScript engines and interpreters

Let's say you want to bypass a Cloudflare JavaScript challenge while appearing as a mobile user agent.

For that, you'll need a JavaScript engine and some of the following parameters:

scraper = cloudscraper.create_scraper( 
	interpreter='nodejs', 
	delay=10, 
	browser={ 
		'browser': 'chrome', 
		'platform': 'android', 
		'desktop': False, 
	}, 
	captcha={ 
		'provider': '2captcha', 
		'api_key': 'you_2captcha_api_key', 
	}, 
)

The mobile and desktop parameters are "True" by default, so you must turn one off if you want only the other.

Also, Cloudscraper has a list of supported JavaScript engines and third-party CAPTCHA solvers. You can check the PyPI documentation for more details.

Can Cloudscraper Bypass Newer Cloudflare Versions?

Cloudflare frequently updates its bot protection techniques, so let's see how Cloudscraper fights against its newer versions.

For this example, we'll try to scrape Author as an example, a website that uses a newer Cloudflare version.

Upon visiting this website on a browser, it automatically redirects us to the Cloudflare waiting room. There it checks if our connection is secure.

Cloudflare waiting room
Click to open the image in full screen

Since we're sending this request from an actual browser, Cloudflare accepts our connection and redirects us to the original home page.

Author Today site
Click to open the image in full screen

Now, let's try accessing this website's content with Cloudscraper.

import cloudscraper 
 
url = "https://author.today/" 
scraper = cloudscraper.create_scraper() 
info = scraper.get(url) 
print("the status code is ", info.status_code) 
print(info.text)

And it brings the following result:

cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

As you can see, Cloudscraper is not working against newer Cloudflare versions.

The displayed error message suggests that Cloudscraper has a paid version that would work. Unfortunately, that's not the case.

So, how can you solve this problem?

Cloudflare uses a host of frequently updated techniques to facilitate bot detection and blocking.

That said, the only way to go past them is by imitating natural user behavior. You can achieve that with the help of headless browsers like Selenium or Puppeteer, alongside valid and proper headers for HTTP requests.

However, these approaches also have their limitations and don't work always.

What Is a Good Cloudscraper Alternative?

If you've encountered trouble with newer Cloudflare versions, then it's time to switch the tool!

ZenRows is a powerful web scraping library that helps with bypassing Cloudflare, regardless of its frequent updates.

Let's try scraping our target website with it!

Start by creating a free account to get your free API key.

Once logged in, you'll see ZenRows' Request Builder. Now, do the following.

  1. Enter your target website URL to scrape directly from your dashboard UI.
  2. Select Python language, API mode, and check the Antibot and JavaScript rendering (features you'll need).
  3. Click on Try It to see if it works.
ZenRows dashboard
Click to open the image in full screen
  1. Check the scraping result ZenRows displays at the bottom of the page.

Here's what we got:

ZenRows bypassed Cloudflare
Click to open the image in full screen

Yay! ๐Ÿฅณ While we saw Cloudscraper fail against newer Cloudflare versions, ZenRows succeeds.

With its intuitive API, you can bypass the antibot protections with ease and extract the information you need from any website.

Furthermore, ZenRows can scale your web scraping efforts, so don't hesitate to try it for free.

Conclusion

We saw that using Cloudscraper in Python is helpful with older Cloudflare versions, yet a different library such as ZenRows needs to be implemented to bypass its newer versions.

Also, you can save time and reduce costs by using a web scraping API designed to win over all sorts of anti-scraping protections and system updates.

Did you find the content helpful? Spread the word and share it on Twitter, LinkedIn, or Facebook.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.