How to Bypass CAPTCHA With Selenium in Java?

Rubén del Campo
Rubén del Campo
October 7, 2024 · 3 min read

Running into CAPTCHAs while web scraping is frustrating and impossible to deal with when using a simple scraper. Luckily, there are a few easy solutions to this problem.

In this tutorial, you'll learn how to bypass CAPTCHAs with Selenium Java using the following methods.

Can Selenium and Java Handle CAPTCHA?

Yes, Selenium Java is adaptable and can handle CAPTCHAs. However, your approach should depend on the target website's configuration.

Some websites only display CAPTCHA challenges when they suspect bot-like activities, while others use CAPTCHAs as part of their initial configuration to immediately mitigate automated access.

In the first case, you can avoid the challenge altogether by emulating natural browsing behavior. In the second scenario, the CAPTCHA needs to be solved by a human being.

While you can configure Selenium while web scraping in Java for both scenarios, solving CAPTCHAs can be challenging to scale. That's why we'll focus on the more effective approach, which is preventing CAPTCHAs from appearing in the first place.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Method #1: Use Undetected ChromeDriver With Selenium and Java

Modern websites often employ anti-bot mechanisms that can easily detect traditional WebDriver tools and block your web scraper. This makes it challenging to emulate natural browsing behavior using Selenium.

Luckily, Undetected ChromeDriver (UC) can help. By patching Selenium's automation properties, UC makes it harder for websites to detect automated requests, enabling you to scrape without triggering CAPTCHA challenges.

While Undetected ChromeDriver is originally a Python library, community-driven projects like Undetected-chromedriver for Java allow you to leverage its functionalities for free in your Java project.

To learn how to do it, let's try to scrape Antibot Challenge, an antibot-protected test website that presents the CAPTCHA when it suspects bot traffic.

But before we dive in, let's see how base Selenium works against this target website.

Example
package com.example;
 
import org.apache.commons.io.FileUtils;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
 
import java.io.File;
import java.io.IOException;
 
public class Main {
    public static void main(String[] args) {
 
        // set Chrome options for headless mode
        ChromeOptions options = new ChromeOptions();
        options.addArguments("--headless");
 
        WebDriver driver = new ChromeDriver(options);
 
        try {
            // navigate to the target URL
            driver.get("https://www.scrapingcourse.com/antibot-challenge");
 
            // wait for 3 seconds (3000 milliseconds)
            Thread.sleep(3000);
 
            // take a screenshot of the page
            File scrFile = ((TakesScreenshot) driver).getScreenshotAs(OutputType.FILE);
            // save the screenshot to the desired location
            FileUtils.copyFile(scrFile, new File("screenshot.png"));
 
            System.out.println("Screenshot saved as screenshot.png");
        } catch (IOException | InterruptedException e) {
            e.printStackTrace();
        } finally {
            // close the browser
            driver.quit();
        }
    }
}

Here's the result:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen

The website flags the Selenium script as a bot and refuses to verify the request. This proves that you must fortify Selenium to bypass this CAPTCHA.

Now, let's integrate Undetected ChromeDriver into the Java project.

Add the following XML snippet in your pom.xml dependencies section to get started.

pom.xml
<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>4.8.2</version>
    <exclusions>
        <exclusion>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-api</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-chrome-driver</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-support</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-api</artifactId>
    <version>4.8.2</version>
</dependency>
<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-chrome-driver</artifactId>
    <version>4.8.2</version>
    <exclusions>
        <exclusion>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-api</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-support</artifactId>
    <version>4.8.2</version>
    <exclusions>
        <exclusion>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-api</artifactId>
        </exclusion>
    </exclusions>
</dependency>

Then, copy undetected-chromedriver-for-java source code and integrate it into your project's directory structure.

If your project only contains the initial Selenium script, you only need to replace the src folder in the project root directory with that of the library.

After that, navigate to the Test.java file within the new src directory. You'll find the following boilerplate code.

Test.java
import com.frogking.chromedriver.ChromeDriverBuilder;
 
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeDriverService;
import org.openqa.selenium.chrome.ChromeOptions;
 
import java.io.File;
 
public class Test {
 
    public static void main(String[] args){
 
        String driver_home = "your driver home";
 
        // 1  if use chromeOptions, recommend use this
        // ChromeDriverBuilder could throw RuntimeError, you can catch it, *catch it is unnecessary
        ChromeOptions chrome_options = new ChromeOptions();
        chrome_options.addArguments("--window-size=1920,1080");
        //chrome_options.addArguments("--headless=new"); when chromedriver > 108.x.x.x
        //chrome_options.addArguments("--headless=chrome"); when chromedriver <= 108.x.x.x
 
        ChromeDriverService service = new ChromeDriverService.Builder()
                .usingDriverExecutable(new File(driver_home))
                .usingAnyFreePort()
                .build();
 
        //ChromeDriver chromeDriver1 = new ChromeDriver(service);
        ChromeDriver chromeDriver1 = new ChromeDriverBuilder()
                .build(chrome_options,driver_home);
 
        // 2  don't use chromeOptions
        //ChromeDriver chromeDriver2 = new ChromeDriverBuilder()
        //        .build("your driver home");
 
        chromeDriver1.get("your url");
        //chromeDriver2.get("your url");
 
    }
}

This code already imports the ChromeDriverBuilder class, which the Java library provides for patching the standard ChromeDriver. To verify that it works, navigate to the target website (https://www.scrapingcourse.com/antibot-challenge) and take a screenshot.

But first, replace your driver home with the part to your ChromeDriver executable. Then, enter the target URL and add the code to take a screenshot.

You'll have the following final code:

Test.java
import com.frogking.chromedriver.ChromeDriverBuilder;
 
import org.apache.commons.io.FileUtils;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
 
import java.io.File;
import java.io.IOException;
 
public class Test {
 
    public static void main(String[] args){
 
        String driver_home = "C:\\path\\to\\Chromedriver.exe";
 
        // define the necessary chrome options
        ChromeOptions chrome_options = new ChromeOptions();
        chrome_options.addArguments("--window-size=1920,1080");
        chrome_options.addArguments("--headless=new");
 
        // create ChromeDriverBuilder instance and build the standard chromedriver
        ChromeDriver chromeDriver1 = new ChromeDriverBuilder()
                .build(chrome_options,driver_home);
 
        // navigate to target website
        chromeDriver1.get("https://www.scrapingcourse.com/antibot-challenge");
 
        try {
 
            // wait for 3 seconds (3 milliseconds)
            Thread.sleep(3000);
 
            // take a screenshot of the page
            File scrFile = ((TakesScreenshot) chromeDriver1).getScreenshotAs(OutputType.FILE);
            // save the screenshot to the desired location
            FileUtils.copyFile(scrFile, new File("screenshot.png"));
 
            System.out.println("Screenshot saved as screenshot.png");
        } catch (IOException | InterruptedException e) {
            e.printStackTrace();
        } finally {
            // close the browser
            chromeDriver1.quit();
        }
 
    }
}

Run it, and you'll get the following screenshot.png file:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen

This means Undetected ChromeDriver is stuck on the CAPTCHA page.

While Undetected ChromeDriver was once a go-to solution for web scraping, it has fallen behind in the arms race against anti-bot systems and WAF bypass. Luckily, there are solutions for bypassing CAPTCHAs and anti-bots, such as a web scraping API.

Method #2: Bypass CAPTCHA With a Web Scraping API

The best solution for avoiding CAPTCHAs is to use ZenRows' Universal Scraper API. It provides a full-fledged anti-bot bypass toolkit, such as JavaScript rendering capabilities, CAPTCHA bypass, automatic header management, premium proxy rotation, and more.

Let's see how ZenRows performs against a protected page like the anti-bot challenge page.

Start by signing up for a new account, and you'll get to the Request Builder.

building a scraper with zenrows
Click to open the image in full screen

Paste the target URL, enable JS Rendering, and activate Premium Proxies.

Next, select Java and click on the API connection mode. Then, copy the generated code and paste it into your script.

scraper.java
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;

public class Main {
    public static void main(String[] args) {
        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create("https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https://www.scrapingcourse.com/antibot-challenge&js_render=true&premium_proxy=true"))
            .build();

        client.sendAsync(request, HttpResponse.BodyHandlers.ofString())
            .thenApply(HttpResponse::body)
            .thenAccept(System.out::println)
            .join();
    }
}

Make sure you have the required HTTP client library:

pom.xml
<dependency>
    <groupId>org.apache.httpcomponents.client5</groupId>
    <artifactId>httpclient5</artifactId>
    <version>5.2.1</version>
</dependency>

Run the code and you'll successfully access the page:

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Antibot Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Antibot challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Congratulations! 🎉 You've successfully bypassed the anti-bot challenge page using ZenRows. This works for any website.

Conclusion

To bypass CAPTCHAs using Selenium, you must completely emulate natural user behavior. While Undetected ChromeDriver can patch most of Selenium's automation properties, advanced anti-bot systems can still detect it.

For guaranteed results, you need ZenRows, a web scraping API that provides an all-in-one solution for bypassing all CAPTCHAs and scraping any website. Try ZenRows for free.

Ready to get started?

Up to 1,000 URLs for free are waiting for you