Web Crawling Webinar for Tech Teams
Web Crawling Webinar for Tech Teams

How to Bypass CAPTCHA With Selenium and Node.js

Yuvraj Chandra
Yuvraj Chandra
October 4, 2024 · 3 min read

CAPTCHAs are a web scraper's nightmare. Those annoying pop-ups will likely stop your scraper in its tracks, especially if it uses automation tools such as Selenium.

In this tutorial, you'll learn the best methods for bypassing CAPTCHAs using Selenium in Node.js.

Can Selenium With Node.js Handle CAPTCHA?

Yes, you can configure Selenium Node.js to bypass CAPTCHAs. Depending on the target website's configuration, you have two options.

Some websites only display CAPTCHAs when their anti-bot system detects suspicious activities, such as requests from an automated browser. This scenario allows you to avoid the CAPTCHA altogether by emulating natural user behavior, ensuring you don't trigger the anti-bot system.

Other websites, however, use CAPTCHAs as part of the initial web page configuration to regulate bot traffic, showing them to all users. In that case, you must solve the CAPTCHA to gain access to the website's content. For automated web scrapers, this typically involves third-party services that pay humans to solve the CAPTCHA for you.

While both cases are doable, investing in third-party services can get expensive and challenging to scale. That's why we'll focus on the more effective technique of preventing CAPTCHAs from appearing in the first place.

We'll introduce two methods to achieve that. One is free but doesn't work for every situation. The other is paid and 100% foolproof. Let's go!

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Method #1: Use Undetected ChromeDriver With Selenium and Node.js

Let's start with the free method: using Undetected ChromeDriver with Selenium.

To understand how Undetected ChromeDriver works, let's take a step back and analyze base Selenium.

Selenium uses the ChromeDriver, a small executable file, to control Chromium browsers. This file serves as a bridge between the Selenium WebDriver and the browser you're automating.

The problem with using the standard ChromeDriver is that it leaks a lot of information to the target server. If the website uses anti-bot protection, you'll likely get flagged and face an unsolvable challenge, such as Cloudflare's Turnstile CAPTCHA. This is a common issue since Cloudflare is one of the most widely used WAFs - learn more about bypassing Cloudflare.

This is where Undetected ChromeDriver, a customized version of the standard ChromeDriver, plays a key role. It uses various techniques, including fingerprint spoofing, masking automation flags, etc., to make Selenium appear more human than its base version.

By imitating natural user behavior, Undetected ChromeDriver can help you bypass CAPTCHAs. However, it doesn't always guarantee success, as its attempt to appear human only works for websites with basic protection.

For more details, including a step-by-step setup, check out this guide on using Undetected ChromeDriver with Node.js.

While Undetected ChromeDriver may work for simpler CAPTCHA and other detection systems, it fails against websites with advanced anti-bot protections. For example, it won't be able to bypass Antibot Challenge page, which is heavily protected by anti-bot.

However, there's a way to deal even with the most difficult CAPTCHA blocks: a web scraping API.

Method #2: Bypass CAPTCHA With ZenRows

The best solution for avoiding CAPTCHAs is to use ZenRows' Universal Scraper API. It provides automatic header management, premium proxy rotation, JavaScript rendering capabilities, auto-bypasses CAPTCHAs, and everything you need to avoid getting blocked.

Getting started with ZenRows is extremely easy. Let's see how it works against the anti-bot challenge page, a webpage protected by an anti-bot.

Start by signing up for a new account to get to the Request Builder.

building a scraper with zenrows
Click to open the image in full screen

Insert the target URL into the link box, enable JS Rendering, and activate Premium Proxies.

Next, choose Node.js and then click on the API connection mode. After that, copy the generated code and paste it into your script.

scraper.js
const axios = require('axios'); 

const params = { 
url: 'https://www.scrapingcourse.com/antibot-challenge', 
apikey: '<YOUR_ZENROWS_API_KEY>', 
js_render: 'true', 
premium_proxy: 'true' 
}; 

axios.get('https://api.zenrows.com/v1/', { params }) 
.then(({ data }) => console.log(data)) 
.catch(error => console.error(error));

The generated code uses the Axios library as the HTTP client. You can install this library using the following command:

Terminal
npm install axios

When you run this code, you'll successfully access the page:

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Antibot Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Antibot challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Congratulations! 🎉 You've successfully bypassed the anti-bot challenge page using ZenRows. This works for any website.

Conclusion

Emulating natural user behavior allows you to scrape without triggering CAPTCHA challenges. This article discussed a free and paid method of achieving that. However, while free solutions may work for simpler web scraping tasks, they usually fail against advanced anti-bot systems, especially for large-scale scraping.

You need a web scraping API like ZenRows to bypass all CAPTCHAs and scrape any website without getting blocked. Try ZenRows for free.

Ready to get started?

Up to 1,000 URLs for free are waiting for you