How to Bypass CAPTCHA With Selenium in Java?

Rubén del Campo
Rubén del Campo
October 7, 2024 · 3 min read

Running into CAPTCHAs while web scraping is frustrating and impossible to deal with when using a simple scraper. Luckily, there are a few easy solutions to this problem.

In this tutorial, you'll learn how to bypass CAPTCHAs with Selenium Java using the following methods.

Can Selenium and Java Handle CAPTCHA?

Yes, Selenium Java is adaptable and can handle CAPTCHAs. However, your approach should depend on the target website's configuration.

Some websites only display CAPTCHA challenges when they suspect bot-like activities, while others use CAPTCHAs as part of their initial configuration to immediately mitigate automated access.

In the first case, you can avoid the challenge altogether by emulating natural browsing behavior. In the second scenario, the CAPTCHA needs to be solved by a human being.

While you can configure Selenium Java for both scenarios, solving CAPTCHAs can be challenging to scale. That's why we'll focus on the more effective approach, which is preventing CAPTCHAs from appearing in the first place.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Method #1: Use Undetected ChromeDriver With Selenium and Java

Modern websites often employ anti-bot mechanisms that can easily detect traditional WebDriver tools and block your web scraper. This makes it challenging to emulate natural browsing behavior using Selenium.

Luckily, Undetected ChromeDriver (UC) can help. By patching Selenium's automation properties, UC makes it harder for websites to detect automated requests, enabling you to scrape without triggering CAPTCHA challenges.

While Undetected ChromeDriver is originally a Python library, community-driven projects like Undetected-chromedriver for Java allow you to leverage its functionalities for free in your Java project.

To learn how to do it, let's try to scrape Antibot Challenge, an antibot-protected test website that presents the CAPTCHA when it suspects bot traffic.

But before we dive in, let's see how base Selenium works against this target website.

Example
package com.example;
 
import org.apache.commons.io.FileUtils;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
 
import java.io.File;
import java.io.IOException;
 
public class Main {
    public static void main(String[] args) {
 
        // set Chrome options for headless mode
        ChromeOptions options = new ChromeOptions();
        options.addArguments("--headless");
 
        WebDriver driver = new ChromeDriver(options);
 
        try {
            // navigate to the target URL
            driver.get("https://www.scrapingcourse.com/antibot-challenge");
 
            // wait for 3 seconds (3000 milliseconds)
            Thread.sleep(3000);
 
            // take a screenshot of the page
            File scrFile = ((TakesScreenshot) driver).getScreenshotAs(OutputType.FILE);
            // save the screenshot to the desired location
            FileUtils.copyFile(scrFile, new File("screenshot.png"));
 
            System.out.println("Screenshot saved as screenshot.png");
        } catch (IOException | InterruptedException e) {
            e.printStackTrace();
        } finally {
            // close the browser
            driver.quit();
        }
    }
}

Here's the result:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen

The website flags the Selenium script as a bot and refuses to verify the request. This proves that you must fortify Selenium to bypass this CAPTCHA.

Now, let's integrate Undetected ChromeDriver into the Java project.

Add the following XML snippet in your pom.xml dependencies section to get started.

pom.xml
<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>4.8.2</version>
    <exclusions>
        <exclusion>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-api</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-chrome-driver</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-support</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-api</artifactId>
    <version>4.8.2</version>
</dependency>
<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-chrome-driver</artifactId>
    <version>4.8.2</version>
    <exclusions>
        <exclusion>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-api</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-support</artifactId>
    <version>4.8.2</version>
    <exclusions>
        <exclusion>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-api</artifactId>
        </exclusion>
    </exclusions>
</dependency>

Then, copy undetected-chromedriver-for-java source code and integrate it into your project's directory structure.

If your project only contains the initial Selenium script, you only need to replace the src folder in the project root directory with that of the library.

After that, navigate to the Test.java file within the new src directory. You'll find the following boilerplate code.

Test.java
import com.frogking.chromedriver.ChromeDriverBuilder;
 
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeDriverService;
import org.openqa.selenium.chrome.ChromeOptions;
 
import java.io.File;
 
public class Test {
 
    public static void main(String[] args){
 
        String driver_home = "your driver home";
 
        // 1  if use chromeOptions, recommend use this
        // ChromeDriverBuilder could throw RuntimeError, you can catch it, *catch it is unnecessary
        ChromeOptions chrome_options = new ChromeOptions();
        chrome_options.addArguments("--window-size=1920,1080");
        //chrome_options.addArguments("--headless=new"); when chromedriver > 108.x.x.x
        //chrome_options.addArguments("--headless=chrome"); when chromedriver <= 108.x.x.x
 
        ChromeDriverService service = new ChromeDriverService.Builder()
                .usingDriverExecutable(new File(driver_home))
                .usingAnyFreePort()
                .build();
 
        //ChromeDriver chromeDriver1 = new ChromeDriver(service);
        ChromeDriver chromeDriver1 = new ChromeDriverBuilder()
                .build(chrome_options,driver_home);
 
        // 2  don't use chromeOptions
        //ChromeDriver chromeDriver2 = new ChromeDriverBuilder()
        //        .build("your driver home");
 
        chromeDriver1.get("your url");
        //chromeDriver2.get("your url");
 
    }
}

This code already imports the ChromeDriverBuilder class, which the Java library provides for patching the standard ChromeDriver. To verify that it works, navigate to the target website (https://www.scrapingcourse.com/antibot-challenge) and take a screenshot.

But first, replace your driver home with the part to your ChromeDriver executable. Then, enter the target URL and add the code to take a screenshot.

You'll have the following final code:

Test.java
import com.frogking.chromedriver.ChromeDriverBuilder;
 
import org.apache.commons.io.FileUtils;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
 
import java.io.File;
import java.io.IOException;
 
public class Test {
 
    public static void main(String[] args){
 
        String driver_home = "C:\\path\\to\\Chromedriver.exe";
 
        // define the necessary chrome options
        ChromeOptions chrome_options = new ChromeOptions();
        chrome_options.addArguments("--window-size=1920,1080");
        chrome_options.addArguments("--headless=new");
 
        // create ChromeDriverBuilder instance and build the standard chromedriver
        ChromeDriver chromeDriver1 = new ChromeDriverBuilder()
                .build(chrome_options,driver_home);
 
        // navigate to target website
        chromeDriver1.get("https://www.scrapingcourse.com/antibot-challenge");
 
        try {
 
            // wait for 3 seconds (3 milliseconds)
            Thread.sleep(3000);
 
            // take a screenshot of the page
            File scrFile = ((TakesScreenshot) chromeDriver1).getScreenshotAs(OutputType.FILE);
            // save the screenshot to the desired location
            FileUtils.copyFile(scrFile, new File("screenshot.png"));
 
            System.out.println("Screenshot saved as screenshot.png");
        } catch (IOException | InterruptedException e) {
            e.printStackTrace();
        } finally {
            // close the browser
            chromeDriver1.quit();
        }
 
    }
}

Run it, and you'll get the following screenshot.png file:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen

This means Undetected ChromeDriver is stuck on the CAPTCHA page.

While Undetected ChromeDriver was once a go-to solution for web scraping, it has fallen behind in the arms race against anti-bot systems. Luckily, there are solutions for bypassing CAPTCHAs and anti-bots, such as a web scraping API.

Method #2: Bypass CAPTCHA With a Web Scraping API

We've seen that free and open-source solutions are often insufficient, as websites using sophisticated anti-bot systems can easily detect and block them.

Fortunately, web scraping APIs, like ZenRows, offer the ultimate solution for bypassing all CAPTCHAs.

With features such as auto-rotating user agents, premium proxies, geolocation, and more, ZenRows provides the complete toolkit for scraping any website without getting blocked.

ZenRows also offers headless browser functionality, which means it can replace Selenium altogether. It allows you to interact with page elements and emulate natural browsing behavior without additional overhead.

Below is a step-by-step guide on how to use ZenRows against the Antibot Challenge page where Undetected ChromeDriver failed.

Sign up for free, and you'll be redirected to the Request Builder page:

building a scraper with zenrows
Click to open the image in full screen

Input the target URL (https://www.scrapingcourse.com/antibot-challenge) and activate Premium Proxies and the JS rendering mode.

That'll generate your request code on the right. Copy it, and use your preferred HTTP client. This example uses the Fluent API of the Apache HTTP library, which you can add to your Maven project by including the following XML snippet in your pom.xml file.

pom.xml
<!-- https://mvnrepository.com/artifact/org.apache.httpcomponents.client5/httpclient5-fluent -->
<dependency>
    <groupId>org.apache.httpcomponents.client5</groupId>
    <artifactId>httpclient5-fluent</artifactId>
    <version>5.3.1</version>
</dependency>

Your code should be similar to this:

Example
import org.apache.hc.client5.http.fluent.Request;
 
public class APIRequest {
    public static void main(final String... args) throws Exception {
        String apiUrl = "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.scrapingcourse.com%2Fantibot-challenge&js_render=true&premium_proxy=true";
        String response = Request.get(apiUrl)
                .execute().returnContent().asString();
 
        System.out.println(response);
    }
}

Run it, and you'll get the HTML content of your target web page:

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Antibot Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Antibot challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Good job!

Conclusion

To bypass CAPTCHAs using Selenium, you must completely emulate natural user behavior. While Undetected ChromeDriver can patch most of Selenium's automation properties, advanced anti-bot systems can still detect it.

For guaranteed results, you need ZenRows, a web scraping API that provides an all-in-one solution for bypassing all CAPTCHAs and scraping any website. Try ZenRows for free.

Ready to get started?

Up to 1,000 URLs for free are waiting for you