The Anti-bot Solution to Scrape Everything? Get Your Free API Key! 😎

How to Bypass Cloudflare in Python

January 12, 2023 · 5 min read

Getting detected by Cloudflare Bot Manager while scraping is quite frequent and can slow down your scraping process or even put a stop to the operation. The best way to avoid this is by making use of popular libraries created to get around this anti-bot protection.

In this article, we'll mention some proven tools to bypass Cloudflare in Python and share pieces of advice on how to use them to scrape any webpage whose data you're interested in.

Let's get started!

What Is Cloudflare Bot Manager?

Cloudflare Bot Manager is one of the most professional and implemented web security systems used to mitigate attacks from malicious bots. Unfortunately for us, web scrapers might be unfairly detected.

Cloudflare bot detection techniques include TLS fingerprinting, Event tracking and canvas fingerprinting. If you've tried to scrape a Cloudflare-protected site before, some of the errors you'll see include:

  • Error 1020: access denied.
  • Error 1010: the owner of this website has banned your access based on your browser's signature.
  • Error 1015: you are being rate limited.
  • Error 1012: access denied.

These are usually accompanied by a Cloudflare 403 Forbidden HTTP response status code.

Can Cloudflare Detect Python Scrapers?

Yes, Cloudflare is capable of detecting Python scrapers since they're not whitelisted and it assumes they're malicious by default. Therefore, your web scraper can get denied access to a web page.

Let's run through a quick scraping example using the requests Python library to scrape Opensea.io, an NFT trading platform that uses Cloudflare as its major anti-bot security.

We'll start by installing the library:

Terminal
pip install requests

And then we'll send a request to the target website:

scraper.py
#Let's do a canonic scraping with requests 
import requests 
 
scraper = requests.get('https://opensea.io/rankings/trending').text 
print(scraper)

It didn't work. 😢

Our requests-based scraper returns a raw HTML content containing the error code at the top:

Error with Requests
Click to open the image in full screen

This proves requests is not a reliable method for bypassing Cloudflare's security measures as it often returns an access denied error. So how do you avoid Python Cloudflare detection while scraping? Let's get into that.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

How to Bypass Cloudflare in Python

There are different libraries to bypass Cloudflare while web scraping in Python:

  • ZenRows.
  • cloudscraper.
  • cfscrape.
  • undetected_chromedriver.

Let's take a look at these tools and how they can be used successfully.

ZenRows

The best way to bypass Cloudflare with Python is using ZenRows. It's a web scraping API capable of bypassing Cloudflare in Python with a single request. It simplifies the process of integrating scraping tasks into your workflow with its advanced anti-bot features and proxy modes.

👍 Pros:

  • Easy to use.
  • Capable of bypassing anti-bots, like Cloudflare and CAPTCHAs.
  • It can bypass Cloudflare v2 challenge CAPTCHA.
  • Smart rotating and premium proxies are included.
  • ZenRows can scrape JavaScript-rendered pages.
  • It's compatible with other libraries, making it easy to integrate into your existing workflows.
  • Chat support done by developers.
  • Constantly updated.

👎 Cons:

  • It's a paid service, but offers a free trial.

How to Bypass Cloudflare in Python Using ZenRows

To crawl the data from unprotected sources, you'll only need two pieces of information: a free API key and the URL of your target website.

Thus, getting back to our case of scraping the Opensea website, you just 1) import the requests library and 2) send a get() request to the ZenRows API with the URL you want to scrape.

scraper.py
import requests 
 
response = requests.get("https://api.zenrows.com/v1/?apikey=YOUR_API_KEY&url=https%3A%2F%2Fopensea.io%2Frankings%2Ftrending") 
 
print(response.text)

When it comes to bypassing Cloudflare using Python, simply add &antibot=true and the proxy_country parameter to your request:

scraper.py
response_antibot = requests.get("https://api.zenrows.com/v1/?apikey=YOUR_API_KEY&url=https%3A%2F%2Fopensea.io%2Frankings%2Ftrending&antibot=true&premium_proxy=true&proxy_country=us") 
 
print(response_antibot.text)

To scrape a specific piece of information, complement your request with the Wait For Selector feature by adding &wait_for=.background-load. This will make ZenRows wait for the desired content to load before proceeding with the data extraction.

scraper.py
response_specific = requests.get("https://api.zenrows.com/v1/?apikey=YOUR_API_KEY&url=https%3A%2F%2Fopensea.io%2Frankings%2Ftrending&js_render=true&wait_for=.content") 
 
print(response_specific.text)

In just a few seconds, ZenRows API will return the webpage content. Here's what we got from the Opensea web page:

Output
<!DOCTYPE html><html lang="en-US"><head><meta charSet="utf-8"/><meta content="width=device-width,initial-scale=1" name="viewport"/><link href="https://opensea.io/rankings/trending" hrefLang="en" rel="alternate"/><link href="https://opensea.io/zh-CN/rankings/trending" hrefLang="zh-CN" rel="alternate"/><link href="https://opensea.io/zh-TW/rankings/trending" hrefLang="zh-TW" rel="alternate"/><link href="https://opensea.io/de-DE/rankings/trending" hrefLang="de-DE" rel="alternate"/><link href="https://opensea.io/es/rankings/trending" hrefLang="es" rel="alternate"/><link href="https://opensea.io/fr/rankings/trending" hrefLang="fr" rel="alternate"/><link href="https://opensea.io/kr/rankings/trending" hrefLang="kr" rel="alternate"/><link href="https://opensea.io/ja/rankings/trending" hrefLang="ja" rel="alternate"/><link rel="preload"......

This is all! You can use Python to do Cloudflare bypass for any website now.

cloudscraper

Cloudscraper was built as an easy-to-use algorithm for Python Cloudflare bypass. The package is very similar to requests regarding functionality and parameter acceptance. Its JavaScript engine makes it possible to easily decode and parse JavaScript by imitating the behavior of a regular web browser.

👍 Pros:

  • Easy to use.

👎 Cons:

  • It fails on websites using Cloudflare v2 challenge CAPTCHA.
  • Difficult for beginners.
  • Not updated frequently.
  • It doesn't work well in large-scale scraping projects.

How to Bypass Cloudflare in Python Using cloudscraper

To use cloudscraper in Python to bypass Cloudflare, start by installing it:

Terminal
pip install cloudscraper

The fastest way to employ cloudscraper is to call create_scraper(). Then, cloudscraper operates the same way as a requests session object; you just substitute calls for requests.get() or requests.post() with either scraper.get() or scraper.post().

scraper.py
import cloudscraper 
 
scraper = cloudscraper.create_scraper(delay=10, browser="chrome") 
content = scraper.get("https://opensea.io/rankings/trending").text 
 
print(content)

cloudscraper Python package should be complemented with an additional library like BeautifulSoup4 to parse the data scraped:

scraper.py
from bs4 import BeautifulSoup as bs 
 
# To further process extracted data 
processed_content = bs(content, "html.parser") 
# These classes are not reliable, added here for demo purposes 
processed_content = processed_content.find_all(".eqFKWH .hmMxZB .mGAUR") 
 
scraped_data = list() 
for data in soup: 
	scraped_data.append(data.get_text()) 
 
print(scraped_data)

Boom! Running the script should scrape the target website and your result should look like this:

Output
[ 
	'PATCHWORKS', 
	'Moonrunners Official', 
	'Frog Affirmation Project (FAP)', 
	'Checks - VV Edition', 
	… 
]

However, the downside of using the cloudscraper library is that it can't bypass Cloudflare v2 challenge. This means that if you encounter a website that uses this type of protection, your scraper becomes ineffective. For example, if you try to parse forever21.com, cloudscraper will return the following error message:

Output
cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.

A possible solution is to use a third-party CAPTCHA solver, or a web scraping API that provides anti-bot bypass such as ZenRows.

cfscrape

The cfscrape package is another popular choice for web scraping bypass in Python for Cloudflare due to its low technical complexity. All you need to do is install the requests module in order to interact with the cfscrape scraper. Its simplicity makes it a great choice for those looking to get started with web scraping without the need for advanced technical skills.

However, cfscrape isn't perfect: it can only handle the webpages with the classic Cloudflare anti-bots protection, meaning it'd be completely ineffective with reCAPTCHA challenges.

👍 Pros:

  • Easy to use and implement.

👎 Cons:

  • Ineffective with reCAPTCHA challenges.
  • It lacks maintenance and updates.
  • Not as feature-rich as other scraping libraries.
  • It can't handle large-scale scraping.

How to Bypass Cloudflare in Python Using cfscrape

To use cfscrape to bypass Cloudflare in Python, run the installation command via pip.

Terminal
pip install cfscrape

The next step is to import the module and call the create_scraper() method. The rest works the same way as the requests library, so any request we make will bypass Cloudflare's anti-bot protection and crawl the necessary information from the web page.

scraper.py
import cfscrape 
 
scraper = cfscrape.create_scraper() 
scraped_data = scraper.get('https://opensea.io/rankings/trending') 
print(scraped_data.text)

The library returns the same HTML we saw in the previous example.

undetected_chromedriver

undetected-chromedriver, developed as an extension to Selenium, stands out among other analogs for its ability to bypass bot protection software. Generally, this module will automatically load a driver binary into your system and patch it later.

👍 Pros:

  • It can bypass bot protection.
  • It automatically loads and patches a driver binary.

👎 Cons:

  • It's slow compared to other web scraping tools.
  • Inefficient for large-scale web scraping tasks.

How to Bypass Cloudflare in Python Using undetected_chromedriver

To use undetected-chromedriver for Python Cloudflare bypass, start by installing it:

Terminal
pip install undetected-chromedriver

Now, import undetected-chromedriver and use the uc.Chrome() method to create a headless Chrome web browser object, and then use the driver.get() method to add to the URL you want to scrape.

scraper.py
import undetected_chromedriver as uc 
driver = uc.Chrome() 
driver.get('https://opensea.io/rankings/trending')

It's important to note that the undetected_chromedriver library is only designed to bypass Cloudflare's security measures and can't be used as a primary solution for complex scraping. Therefore, you'll have to combine this module with other libraries to scrape data from the website.

Here, you can see the output webpage opened in a fortified headless browser:

Output
Click to open the image in full screen

Conclusion

Knowing how to bypass anti-bots is as important as the scraping process itself, especially when you're looking to scrape a web page protected by Cloudflare. In this article, we covered the different techniques that can be used to bypass Cloudflare using Python: ZenRows, cloudscraper, cfscrape and undetected-chromebrowser.

While most of these tools are effective for bypassing Python Cloudflare detection, they fail when it comes to large-scale scraping or advanced Cloudflare security measures, like Cloudflare v2 challenge CAPTCHA. ZenRows is the only solution capable of bypassing any type of anti-bot, and you can get your free API key now.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.