The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ūüėé

How to Bypass Cloudflare in NodeJS

January 24, 2023 · 6 min read

Cloudflare is a very popular anti-bot system that's capable of detecting and blocking bots, which makes our scraping job harder. However, there's hope! The best way to bypass Cloudflare in NodeJS is using proven libraries that are capable of mimicking a real user. They are:

Let's discuss the details of these scraping packages, as well as seeing a code example of each, to learn how they're used for NodeJS Cloudflare bypass.

What Is Cloudflare

Cloudflare is a security company that offers web firewalls to defend applications against several security threats, such as cross-site scripting (XSS), credential stuffing and DDoS attacks. By default, it blocks scrapers. There are different tweaks and tricks to bypass Cloudflare while scraping, so let's go ahead with the libraries in NodeJS used for this purpose.

How to Bypass Cloudflare in NodeJS

We'll scrape data in NodeJS from three websites with different levels of Cloudflare security:

  • Astra¬†is a less-secure website with basic Cloudflare security.
  • OpenSea¬†uses advanced Cloudflare bot detection.
  • G2, the toughest one.

Let's go ahead and scrape these sites using libraries that are capable of bypassing Cloudflare in NodeJS.

1. ZenRows

ZenRows is an all-in-one web scraping API that handles all anti-bot bypass for you, like Cloudflare and reCAPTCHA. Some of its features include rotating proxies, headless browsers, automatic retries and JavaScript rendering.

ūüĎć Pros:

  • The best anti-bot in the industry.
  • You can run your first request in minutes.
  • It can scrape JavaScript-rendered pages.
  • It also works with other libraries.

ūüĎé Cons:

  • It doesn't provide browser extensions.

How to Bypass Cloudflare in NodeJS Using ZenRows

Go to ZenRows and sign up to your free API key, which you'll find at the upper part of the dashboard.

ZenRows Dashboard
Click to open the image in full screen

Once you have your API key, switch to your code editor and install axios into your project folder. It's a popular HTTP requests client and we'll use it to send a GET request to the target websites.

npm install axios

Now, create a JavaScript file named index.js, Include axios and add your ZenRows API Key. Like this:

// Require axios 
const axios = require("axios"); 
 
// Add your API Key 
const APIKEY = "YOUR_API_KEY";

We'll add some anti-bot parameters to the ZenRoes request, send a GET and save the result into a response object. Print out the response.data and use the .catch method to catch errors if any.

axios({ 
	// Add the APIKEY and antobit feature as paramater 
	url: `https://api.zenrows.com/v1/?apikey=${APIKEY}&url=https%3A%2F%2Fwww.getastra.com%2F&antibot=true&premium_proxy=true`, 
	method: "GET", 
}) 
	// Print our result 
	.then(response => console.log(response.data)) 
	// Catch error if any 
	.catch(error => console.log(error));

Here's what your full NodeJS web scraping script should look like:

// Require axios 
const axios = require("axios"); 
 
// Add your API Key 
const APIKEY = "YOUR_API_KEY"; 
 
axios({ 
	// Add the APIKEY and antobit feature as paramater 
	url: `https://api.zenrows.com/v1/?apikey=${APIKEY}&url=https%3A%2F%2Fwww.getastra.com%2F&antibot=true&premium_proxy=true`, 
	method: "GET", 
}) 
	// Print our result 
	.then(response => console.log(response.data)) 
	// Catch error if any 
	.catch(error => console.log(error));

Just replacing the target URL, here's our output for Astra:

ZenRows Output for Astra
Click to open the image in full screen

Nice! Let's do the same thing for the other two websites. Here's the script output for OpenSea:

ZenRows Output for OpenSea
Click to open the image in full screen

And we see a successful result also when it comes to G2:

ZenRows Output for G2
Click to open the image in full screen

Amazing! With ZenRows, we were able to bypass all levels of Cloudflare security in NodeJS.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

2. Humanoid

Humanoid is a Node JS package that solves and bypasses Cloudflare anti-bot challenges. It does this by solving JavaScript challenges using NodeJS runtime and then presenting the HTML, which makes the anti-bot perceive the scraper as a normal web browser.

Some of the features it uses to solve JavaScript challenges include random browser user-agent, auto-retry on failed challenges and custom cookies and headers.

ūüĎć Pros:

  • It's easy to use.
  • Humanoid solves JavaScript challenges automatically.
  • It supports asynchronous JavaScript.

ūüĎé Cons:

  • It hasn't been updated for a long time. So it's hard to get support for a bug or guidance from other developers using it.
  • Unlike other headless browsers that were developed for web testing and accessing elements. Humanoid doesn't support these features, so you'll need to work more to do web scraping with it.

How to Bypass Cloudflare in NodeJS Using Humanoid

To do Node JS Cloudflare bypass with Humanoid, install and include it. Then create a new humanoid instance:

// npm install humanoid-js 
const Humanoid = require("humanoid-js"); 
 
// Create a new humanoid instance 
const humanoid = new Humanoid();

The next step is to send a GET request to the target website using humanoid and a randomly created user-agent header. The Cloudflare bypass feature is set to true by default and the auto-bypass method solves these challenges and retires on failed ones.

// Send Get request to the target website 
humanoid.get("https://www.getastra.com/");

Get the script response and print out res.body which contains the HTML of the page. And use the .catch method to catch possible errors.

	.then(res => { 
		console.log(res.body); // Print the result 
	}) 
	// Catch errors if any 
	.catch(err => { 
		console.log(err) 
	})

Go ahead and run the script on the target website. Here's our output on Astra:

Humanoid Output for Astra
Click to open the image in full screen

We successfully avoided Cloudflare in Nodejs using Humanoid! Here is what the complete code looks like:

const Humanoid = require("humanoid-js"); 
 
// Create a new humanoid instance 
const humanoid = new Humanoid(); 
 
// Send Get request to the target website 
humanoid.get("https://www.getastra.com/") 
	.then(res => { 
		console.log(res.body); // Print the result 
	}) 
	// Catch errors if any 
	.catch(err => { 
		console.log(err) 
	})

Using humanoid with our other examples, here's our output on OpenSea:

Humanoid Output for OpenSea
Click to open the image in full screen

And lastly, G2.com:

Humanoid Output for G2
Click to open the image in full screen

Oops, it seems that we didn't pass! We failed to establish a connection because of Cloudflare so we didn't get a response.

3. Cloudflare-scraper

Cloudflare-scraper is a plugin that works on top of Puppeteer and it's capable of bypassing Cloudflare JavaScript challenges. It lets users add cookies to requests, as well as proxies and user-agent headers.

ūüĎćPros:

  • It works on top of Puppeteer.
  • It can solve Cloudflare JavaScript and CAPTCHA challenges.
  • It's capable of adding proxies to the request.

ūüĎé Cons:

  • The lack of documentation makes it difficult to configure.
  • Cloudflare-scraper is powerless against advanced bot detections.

How to bypass Cloudflare in NodeJS using Cloudflare-scraper

To bypass Node JS Cloudflare detection, install cloudflare-scraper and puppeteer into your project folder and then include cloudflare-scraper. This way:

// npm install cloudflare-scraper puppeteer 
 
// Require cloudflare-scraper 
const cloudflareScraper = require("cloudflare-scraper");

The next step is to send a GET request to the target website. To do this, create a try-catch statement, employing the try method to send the request with Cloudflare-scraper and use the catch method to catch errors. Finally, save the result into response and print out the response.

(async () => { 
	try { 
		// Send Get request to the target website 
		const response = await cloudflareScraper.get("https://www.getastra.com/"); 
 
		// Print out results 
		console.log(response); 
 
		// Handle errors 
	} catch (error) { 
		console.log(error); 
	} 
})();

Here's what the complete cloudflare-scraper script should look like:

// Require cloudflare-scraper 
const cloudflareScraper = require("cloudflare-scraper"); 
 
(async () => { 
	try { 
		// Send Get request to the target website 
		const response = await cloudflareScraper.get("https://www.getastra.com/"); 
 
		// Print out results 
		console.log(response); 
 
		// Handle errors 
	} catch (error) { 
		console.log(error); 
	} 
})();

Using the script on the target websites, here's our output from Astra:

Cloudflare-scraper Output for Astra
Click to open the image in full screen

We were successful on Astra, but sadly OpenSea returned with a status code of 403. We got blocked!

Cloudflare-scraper Output for OpenSea
Click to open the image in full screen

No luck on G2.com either:

Cloudflare-scraper Output for G2
Click to open the image in full screen

4. Puppeteer

Puppeteer is a popular Node JS library that provides a high-level API to headless browsers based on Chromium. It provides real browser functionalities, like visiting pages, clicking links and submitting forms.

ūüĎć Pros:

  • It gives full control of the headless browser.
  • Proxies and headers can be added to Puppeteer.
  • It also allows users to mimic normal user behavior by limiting requests and waiting for a random time before opening a new page.

ūüĎé Cons:

  • It can be easily detected.
  • It's difficult to debug headless browsers such as Puppeteer.

How to Bypass Cloudflare in NodeJS Using Puppeteer

To avoid detection with Puppeteer, install puppeteer into your project folder and include it:

// npm install puppeteer 
 
// Require puppeteer 
const puppeteer = require("puppeteer"); 

The next step is to launch puppeteer using an async function. Do this by creating a new headless browser in the function using the puppeteer.launch() method. Then add a new page tab inside the headless browser.

(async () => { 
	// Initiate the browser 
	const browser = await puppeteer.launch(); 
 
	// Create a new page with the default browser context 
	const page = await browser.newPage();

Use page.setViewport to set the view scale of the headless browser and the await page.goto to tell puppeteer to go to the target website. Set a wait time of 10 seconds using page.waitForTimeout.

	// Setting page view 
	await page.setViewport({ width: 1280, height: 720 }); 
 
	// Go to the target website 
	await page.goto("https://www.getastra.com/"); 
 
	// Wait for security check 
	await page.waitForTimeout(1000);

The last step is getting a response, so take a screenshot using the .screenshot function. Then close the headless browser.

	// Take screenshot 
	await page.screenshot({ path: "image.png" }); 
 
	// Closes the browser and all of its pages 
	await browser.close(); 
})();

Here's what the complete code should look like:

// Require puppeteer 
const puppeteer = require("puppeteer"); 
 
(async () => { 
	// Initiate the browser 
	const browser = await puppeteer.launch(); 
 
	// Create a new page with the default browser context 
	const page = await browser.newPage(); 
 
	// Setting page view 
	await page.setViewport({ width: 1280, height: 720 }); 
 
	// Go to the target website 
	await page.goto("https://www.getastra.com/"); 
 
	// Wait for security check 
	await page.waitForTimeout(1000); 
 
	// Take screenshot 
	await page.screenshot({ path: "image.png" }); 
 
	// Closes the browser and all of its pages 
	await browser.close(); 
})();

Here's the output for Astra:

Puppeteer-Stealth Output for Astra
Click to open the image in full screen

We successfully bypassed Cloudflare with Puppeteer in NodeJS! But we aren't done. Let's run the script on OpenSea and G2.com. Here's the script output for OpenSea:

Puppeteer Output for OpenSea
Click to open the image in full screen

Well, that's a fail! And we got to the same Access Denied screen when targeting G2:

Puppeteer Output for G2
Click to open the image in full screen

5. Puppeteer-stealth

Puppeteer-stealth is a stealth plugin that helps Puppeteer scrapers bypass anti-bots. It uses multiple evasion techniques to bypass Cloudflare using NodeJS, like overriding JS objects in the browser and changing the user-agent header.

ūüĎć Pros:

  • All the pros mentioned for Puppeteer.
  • Harder to detect.

ūüĎé Cons:

How to Bypass Cloudflare in NodeJS Using Puppeteer-stealth

To avoid Cloudflare bot detection using Puppeteer-stealth, install the package into your project folder, then save puppeteer into executablePath and enable puppeteer-stealth using puppeteer.use(pluginStealth()):

// npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth 
 
const puppeteer = require("puppeteer-extra"); 
 
// Add stealth plugin and use defaults 
const pluginStealth = require("puppeteer-extra-plugin-stealth"); 
const { executablePath } = require("puppeteer"); 
 
// Use stealth 
puppeteer.use(pluginStealth());

The next step is to launch a new puppeteer headless browser and use the browser.newPage() to add a new web page in the browser. Then set the view scale of the headless browser using page.setViewport. Finally, use the page.goto() to visit the target website.

// Launch pupputeer-stealth 
puppeteer.launch({ executablePath: executablePath() }).then(async browser => { 
	// Create a new page 
	const page = await browser.newPage(): 
 
	// Setting page view 
	await page.setViewport({ width: 1280, height: 720 }); 
 
	// Go to the website 
	await page.goto("https://www.getastra.com/");

With that, your scraper is ready and all that's left is to wait for the page to load. Take a screenshot and close the browser. Here's how we did that:

	// Wait for page to download 
	await page.waitForTimeout(1000) 
 
	// Take screenshot 
	await page.screenshot({ path: "image.png" }) 
 
	// Close the browser 
	await browser.close(); 
});

The complete code is ready to run and it looks like this:

const puppeteer = require("puppeteer-extra"); 
 
// Add stealth plugin and use defaults 
const pluginStealth = require("puppeteer-extra-plugin-stealth"); 
const { executablePath } = require("puppeteer"); 
 
// Use stealth 
puppeteer.use(pluginStealth()); 
 
// Launch pupputeer-stealth 
puppeteer.launch({ executablePath: executablePath() }).then(async browser => { 
	// Create a new page 
	const page = await browser.newPage(); 
 
	// Setting page view 
	await page.setViewport({ width: 1280, height: 720 }); 
 
	// Go to the website 
	await page.goto("https://www.getastra.com/"); 
 
	// Wait for page to download 
	await page.waitForTimeout(1000); 
 
	// Take screenshot 
	await page.screenshot({ path: "image.png" }); 
 
	// Close the browser 
	await browser.close(); 
});

Running the puppeteer-stealth scraper on Astra, we got this:

Puppeteer-Stealth Output for Astra
Click to open the image in full screen

Good news! It cleared the first hurdle, and the output from OpenSea follows the same positive direction:

Puppeteer-Stealth Output for OpenSea
Click to open the image in full screen

But running the scraper on G2.com, here's what we got:

Puppeteer-Strealth Output for G2
Click to open the image in full screen

The advanced Cloudflare bot on G2 was able to detect Puppeteer-stealth and blocked it from accessing the site.

Conclusion

Learning how to bypass Cloudflare using NodeJS is essential for your web scraping project as it can detect, block and even throttle your web crawler. In this article, we discussed the best five libraries to solve this problem. We used these libraries to scrape three websites with different levels of Cloudflare bot protection, and here's what we got:

Astra OpenSea G2.com
ZenRows ‚úÖ ‚úÖ ‚úÖ
Humanoid ‚úÖ ‚úÖ -
Cloudflare-scraper ‚úÖ - -
Puppeteer ‚úÖ - -
Puppeteer-stealth ‚úÖ ‚úÖ -

Although Humanoid, Cloudflare-scraper and Puppeteer-stealth are libraries often used for NodeJS Cloudflare bypass, they failed to bypass advanced bot protection on G2.com. Meanwhile, ZenRows was able to scrape the data due to its advanced anti-bot bypass features, like smart rotating proxies and customized headless browsers. It integrates well with NodeJS and you can get started for free.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.