Cloudscraper With JavaScript: How It Works, Alternatives

September 20, 2024 · 3 min read

Table of contents

What is Cloudscraper?
Can you bypass Cloudflare with Cloudscraper?
Cloudscraper alternatives for JavaScript
- Web scraping API (ZenRows)
- Puppeteer
- Undetected ChromeDriver
Conclusion

Trying to bypass Cloudflare using JavaScript?

Cloudflare is a real pain for web scrapers - it's like this giant wall between you and the data you need. The Cloudscraper package was supposed to be our secret weapon against Cloudflare's "I'm Under Attack Mode" (IAUM). And sure, while it started as a Python thing, there's a JavaScript version too.

But does it actually work?

What Is Cloudscraper and How Does It Work With JavaScript?

So Cloudflare basically thinks web scrapers are up to no good and blocks them. It's probably the most annoying WAF challenge out there for scrapers. Cloudscraper tries to get around this with a lightweight API that's meant to bypass all those annoying Cloudflare challenges.

The JavaScript version works pretty much the same as the Python one - main difference is it uses request-promise as the default requester.

When you make a request with Cloudscraper JavaScript, it spots Cloudflare, grabs the challenge, solves it, and then sticks the solution into the page before giving you back the page content.

Sounds great in theory, but is it enough to actually get past Cloudflare and grab the data you want?

Let's test it and see what happens.

Can You Bypass Cloudflare With Cloudscraper and JavaScript?

Can Cloudscraper.js bypass Cloudflare?

The proof is in the pudding.

Let's put Cloudscraper to the test in a real-world scenario, using Cloudflare Challenge as the target website.

To follow along, install the Cloudscraper package and save the requests module as a dependency using the following commands.

                    Terminal
                
npm install cloudscraper
npm install --save request

Copied!

Now, make a request to the target page and log the response.

                    Example
                
// import required library
var cloudscraper = require('cloudscraper');
 
// make a GET request to the target webpage
cloudscraper.get('https://www.scrapingcourse.com/cloudflare-challenge')
    .then(console.log, console.error);

Copied!

The script above tries to bypass the Cloudflare protection on the target page. However, we get the following response.

                    Output
                
StatusCodeError: 403

Copied!

The error code above indicates that Cloudflare understood the request but refused to fulfill it because it detected the Cloudscraper request as malicious and denied us access.

This can occur due to different reasons, the most notable being that Cloudscraper js is not actively maintained and cannot keep up with Cloudflare's evolving anti-bot mechanisms.

While this answers your question about Cloudscraper's viability, it begs another question: "How, then, can you bypass Cloudflare?"

Let's explore some alternative methods next.

Cloudscraper Alternatives for JavaScript

To bypass Cloudflare, you must completely emulate natural browsing behavior. Here are three Cloudscraper alternatives for JavaScript, some more effective than others.

Method 1: Web Scraping API (ZenRows)

Web scraping APIs like ZenRows are the most effective and only surefire way to bypass Cloudflare and any anti-bot systems. Here's why.

Cloudflare uses numerous detection measures to identify and block bots. As these techniques continuously evolve, it becomes increasingly challenging and time-consuming to account for all of them manually.

This is where ZenRows plays a key role.

By handling all the technical aspects of emulating natural browsing behavior, this solution allows you to focus on extracting your necessary data.

ZenRows offers numerous features out of the box, including auto-rotating user agents, premium proxies, anti-CAPTCHA, and much more. Its headless browser functionality allows you to render JavaScript like a regular browser, further cementing its ability to imitate user behavior.

All this and more make ZenRows the most effective option for bypassing Cloudflare and any anti-bot system.

See for yourself.

Here's a quick example: using ZenRows to scrape the same Cloudflare Challenge page that blocked Cloudscraper.

To follow along, sign up for free to get your API key. You'll be redirected to the Request Builder page.

building a scraper with zenrows — Click to open the image in full screen

Input the target URL and activate Premium Proxies and the JS Rendering mode. In some cases, ZenRows automatically activates these parameters.

Select the Node js language option on the right and choose the API mode. This will generate your request code.

Copy the code and use your preferred HTTP client to make a request to the ZenRows API. The code below uses Axios.

                    Example
                
// npm install axios
const axios = require('axios');

const url = 'https://www.scrapingcourse.com/cloudflare-challenge';
const apikey = '<YOUR_ZENROWS_API_KEY>';
axios({
    url: 'https://api.zenrows.com/v1/',
    method: 'GET',
    params: {
        'url': url,
        'apikey': apikey,
        'js_render': 'true',
        'premium_proxy': 'true',
    },
})
    .then(response => console.log(response.data))
    .catch(error => console.log(error));

  
  

  
Copied!

Here's the result:

                    Output
                
<html lang="en">
<head>
    <!-- ... -->
    <title>Cloudflare Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Cloudflare challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

  
  

  
Copied!

Awesome, right? That's how easy it is to bypass Cloudflare using ZenRows.

Method 2: Puppetteer

Another way to emulate natural user behavior is by running live browser instances. This allows you to handle JavaScript challenges and also render dynamic pages like a regular browser.

Puppeteer, the most popular JavaScript headless browser, lets you do that. It offers a high-level API for automating Chrome via the DevTools protocol, allowing you to interact with web page elements like you would in a browser.

Moreover, you can include additional configurations like premium proxies and user agent rotation to better your chances.

Unlike Cloudscraper, Puppeteer is actively maintained by the Chrome browser automation team and boasts one of the largest developer communities.

For more information, check out this guide on Puppeteer web scraping.

However, you should know that Puppeteer doesn't guarantee a 100% success rate. Advanced Cloudflare systems can easily detect its automation properties and deny you access.

In such cases, consider web scraping APIs like ZenRows.

Method #3: Undetected ChromeDriver

Undetected ChromeDriver (UC) patches the standard Selenium ChromeDriver to avoid triggering anti-bot detection mechanisms. This makes it harder for websites to identify your scraper.

While it's originally a Python library, you can adopt it for JavaScript web scraping.

This blog on using Undetected ChromeDriver with NodeJS gives a step-by-step guide on achieving that.

Like Puppeteer, UC has its limitations. While it may work against basic anti-bot protection like those on home pages, advanced Cloudflare systems will block your request.

Conclusion

Although Cloudscraper is not the best option for bypassing Cloudflare using JavaScript, other options exist to get you over the hump. The three discussed in this article (ZenRows, Puppeteer, and Undetected Chromedriver) are powerful alternatives.

However, the most effective and only surefire way to avoid getting blocked while web scraping in JavaScript is using Web scraping APIs like ZenRows.

Try ZenRows now for free!