Are you looking to bypass Cloudflare using JavaScript?
The Cloudscraper package is designed to bypass Cloudflare's I'm Under Attack Mode (IAUM), a significant challenge for web scrapers. While it's originally a Python library, Cloudscraper has a JavaScript version.
But is it a viable option?
In this article, we'll review Cloudscraper.js's potential and also recommend more efficient alternatives where necessary.
What Is Cloudscraper and How Does It Work With JavaScript?
Cloudflare often deems web scraping requests to be malicious and blocks them accordingly. To circumvent this, Cloudscraper offers a lightweight API that bypasses Cloudflare's challenges.
Its JavaScript version works similarly to its Python counterpart, the only difference being that it uses request-promise as the default requester.
When you make a request using Cloudscraper JavaScript, it automatically identifies Cloudflare, extracts the corresponding challenge, solves it, and injects the result into the challenge page before returning the request body page.
Is this enough to bypass Cloudflare and access your desired data?
Let's find out.
Can You Bypass Cloudflare With Cloudscraper and JavaScript?
Can Cloudscraper.js bypass Cloudflare?
The proof is in the pudding.
Let's put Cloudscraper to the test in a real-world scenario, using Cloudflare Challenge as the target website.
To follow along, install the Cloudscraper package and save the requests module as a dependency using the following commands.
npm install cloudscraper
npm install --save request
Now, make a request to the target page and log the response.
// import required library
var cloudscraper = require('cloudscraper');
// make a GET request to the target webpage
cloudscraper.get('https://www.scrapingcourse.com/cloudflare-challenge')
.then(console.log, console.error);
The script above tries to bypass the Cloudflare protection on the target page. However, we get the following response.
StatusCodeError: 403
The error code above indicates that Cloudflare understood the request but refused to fulfill it because it detected the Cloudscraper request as malicious and denied us access.
This can occur due to different reasons, the most notable being that Cloudscraper js is not actively maintained and cannot keep up with Cloudflare's evolving anti-bot mechanisms.
While this answers your question about Cloudscraper's viability, it begs another question: "How, then, can you bypass Cloudflare?"
Let's explore some alternative methods next.
Cloudscraper Alternatives for JavaScript
To bypass Cloudflare, you must completely emulate natural browsing behavior. Here are three Cloudscraper alternatives for JavaScript, some more effective than others.
Method 1: Web Scraping API (ZenRows)
Web scraping APIs like ZenRows are the most effective and only surefire way to bypass Cloudflare and any anti-bot systems. Here's why.
Cloudflare uses numerous detection measures to identify and block bots. As these techniques continuously evolve, it becomes increasingly challenging and time-consuming to account for all of them manually.
This is where ZenRows plays a key role.
By handling all the technical aspects of emulating natural browsing behavior, this solution allows you to focus on extracting your necessary data.
ZenRows offers numerous features out of the box, including auto-rotating user agents, premium proxies, anti-CAPTCHA, and much more. Its headless browser functionality allows you to render JavaScript like a regular browser, further cementing its ability to imitate user behavior.
All this and more make ZenRows the most effective option for bypassing Cloudflare and any anti-bot system.
See for yourself.
Here's a quick example: using ZenRows to scrape the same Cloudflare Challenge page that blocked Cloudscraper.
To follow along, sign up for free to get your API key. You'll be redirected to the Request Builder page.
Input the target URL and activate Premium Proxies and the JS Rendering mode. In some cases, ZenRows automatically activates these parameters.
Select the Node js language option on the right and choose the API mode. This will generate your request code.
Copy the code and use your preferred HTTP client to make a request to the ZenRows API. The code below uses Axios.
// npm install axios
const axios = require('axios');
const url = 'https://www.scrapingcourse.com/cloudflare-challenge';
const apikey = '<YOUR_ZENROWS_API_KEY>';
axios({
url: 'https://api.zenrows.com/v1/',
method: 'GET',
params: {
'url': url,
'apikey': apikey,
'js_render': 'true',
'premium_proxy': 'true',
},
})
.then(response => console.log(response.data))
.catch(error => console.log(error));
Here's the result:
<html lang="en">
<head>
<!-- ... -->
<title>Cloudflare Challenge - ScrapingCourse.com</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
<h2>
You bypassed the Cloudflare challenge! :D
</h2>
<!-- other content omitted for brevity -->
</body>
</html>
Awesome, right? That's how easy it is to bypass Cloudflare using ZenRows.
Method 2: Puppetteer
Another way to emulate natural user behavior is by running live browser instances. This allows you to handle JavaScript challenges and also render dynamic pages like a regular browser.
Puppeteer, the most popular JavaScript headless browser, lets you do that. It offers a high-level API for automating Chrome via the DevTools protocol, allowing you to interact with web page elements like you would in a browser.
Moreover, you can include additional configurations like premium proxies and user agent rotation to better your chances.
Unlike Cloudscraper, Puppeteer is actively maintained by the Chrome browser automation team and boasts one of the largest developer communities.
For more information, check out this guide on Puppeteer web scraping.
However, you should know that Puppeteer doesn't guarantee a 100% success rate. Advanced Cloudflare systems can easily detect its automation properties and deny you access.
In such cases, consider web scraping APIs like ZenRows.
Method #3: Undetected ChromeDriver
Undetected ChromeDriver (UC) patches the standard Selenium ChromeDriver to avoid triggering anti-bot detection mechanisms. This makes it harder for websites to identify your scraper.
While it's originally a Python library, you can adopt it for JavaScript web scraping.
This blog on using Undetected ChromeDriver with NodeJS gives a step-by-step guide on achieving that.
Like Puppeteer, UC has its limitations. While it may work against basic anti-bot protection like those on home pages, advanced Cloudflare systems will block your request.
Conclusion
Although Cloudscraper is not the best option for bypassing Cloudflare using JavaScript, other options exist to get you over the hump. The three discussed in this article (ZenRows, Puppeteer, and Undetected Chromedriver) are powerful alternatives.
However, the most effective and only surefire way to avoid getting blocked while web scraping in JavaScript is using Web scraping APIs like ZenRows.