Running into CAPTCHAs while web scraping is frustrating and impossible to deal with when using a simple scraper. Luckily, there are a few easy solutions to this problem.
In this tutorial, you'll learn how to bypass CAPTCHAs with Selenium Java using the following methods.
- Method #1: Use Undetected ChromeDriver with Selenium and Java.
- Method #2: Bypass CAPTCHA with a web scraping APIÂ .
Can Selenium and Java Handle CAPTCHA?
Yes, Selenium Java is adaptable and can handle CAPTCHAs. However, your approach should depend on the target website's configuration.
Some websites only display CAPTCHA challenges when they suspect bot-like activities, while others use CAPTCHAs as part of their initial configuration to immediately mitigate automated access.
In the first case, you can avoid the challenge altogether by emulating natural browsing behavior. In the second scenario, the CAPTCHA needs to be solved by a human being.
While you can configure Selenium Java for both scenarios, solving CAPTCHAs can be challenging to scale. That's why we'll focus on the more effective approach, which is preventing CAPTCHAs from appearing in the first place.
Method #1: Use Undetected ChromeDriver With Selenium and Java
Modern websites often employ anti-bot mechanisms that can easily detect traditional WebDriver tools and block your web scraper. This makes it challenging to emulate natural browsing behavior using Selenium.
Luckily, Undetected ChromeDriver (UC) can help. By patching Selenium's automation properties, UC makes it harder for websites to detect automated requests, enabling you to scrape without triggering CAPTCHA challenges.
While Undetected ChromeDriver is originally a Python library, community-driven projects like Undetected-chromedriver for Java allow you to leverage its functionalities for free in your Java project.
To learn how to do it, let's try to scrape Antibot Challenge, an antibot-protected test website that presents the CAPTCHA when it suspects bot traffic.
But before we dive in, let's see how base Selenium works against this target website.
package com.example;
import org.apache.commons.io.FileUtils;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import java.io.File;
import java.io.IOException;
public class Main {
public static void main(String[] args) {
// set Chrome options for headless mode
ChromeOptions options = new ChromeOptions();
options.addArguments("--headless");
WebDriver driver = new ChromeDriver(options);
try {
// navigate to the target URL
driver.get("https://www.scrapingcourse.com/antibot-challenge");
// wait for 3 seconds (3000 milliseconds)
Thread.sleep(3000);
// take a screenshot of the page
File scrFile = ((TakesScreenshot) driver).getScreenshotAs(OutputType.FILE);
// save the screenshot to the desired location
FileUtils.copyFile(scrFile, new File("screenshot.png"));
System.out.println("Screenshot saved as screenshot.png");
} catch (IOException | InterruptedException e) {
e.printStackTrace();
} finally {
// close the browser
driver.quit();
}
}
}
Ensure you add the Apache Commons
dependency to your pom.xml file to use the FileUtils
Here's the result:
The website flags the Selenium script as a bot and refuses to verify the request. This proves that you must fortify Selenium to bypass this CAPTCHA.
Now, let's integrate Undetected ChromeDriver into the Java project.
Add the following XML snippet in your pom.xml
dependencies section to get started.
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.8.2</version>
<exclusions>
<exclusion>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-chrome-driver</artifactId>
</exclusion>
<exclusion>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-support</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-api</artifactId>
<version>4.8.2</version>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-chrome-driver</artifactId>
<version>4.8.2</version>
<exclusions>
<exclusion>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-support</artifactId>
<version>4.8.2</version>
<exclusions>
<exclusion>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-api</artifactId>
</exclusion>
</exclusions>
</dependency>
Then, copy undetected-chromedriver-for-java source code and integrate it into your project's directory structure.
If your project only contains the initial Selenium script, you only need to replace the src
folder in the project root directory with that of the library.
After that, navigate to the Test.java
file within the new src
directory. You'll find the following boilerplate code.
import com.frogking.chromedriver.ChromeDriverBuilder;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeDriverService;
import org.openqa.selenium.chrome.ChromeOptions;
import java.io.File;
public class Test {
public static void main(String[] args){
String driver_home = "your driver home";
// 1 if use chromeOptions, recommend use this
// ChromeDriverBuilder could throw RuntimeError, you can catch it, *catch it is unnecessary
ChromeOptions chrome_options = new ChromeOptions();
chrome_options.addArguments("--window-size=1920,1080");
//chrome_options.addArguments("--headless=new"); when chromedriver > 108.x.x.x
//chrome_options.addArguments("--headless=chrome"); when chromedriver <= 108.x.x.x
ChromeDriverService service = new ChromeDriverService.Builder()
.usingDriverExecutable(new File(driver_home))
.usingAnyFreePort()
.build();
//ChromeDriver chromeDriver1 = new ChromeDriver(service);
ChromeDriver chromeDriver1 = new ChromeDriverBuilder()
.build(chrome_options,driver_home);
// 2 don't use chromeOptions
//ChromeDriver chromeDriver2 = new ChromeDriverBuilder()
// .build("your driver home");
chromeDriver1.get("your url");
//chromeDriver2.get("your url");
}
}
This code already imports the ChromeDriverBuilder
class, which the Java library provides for patching the standard ChromeDriver. To verify that it works, navigate to the target website (https://www.scrapingcourse.com/antibot-challenge
) and take a screenshot.
But first, replace your driver home
with the part to your ChromeDriver executable. Then, enter the target URL and add the code to take a screenshot.
You'll have the following final code:
import com.frogking.chromedriver.ChromeDriverBuilder;
import org.apache.commons.io.FileUtils;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import java.io.File;
import java.io.IOException;
public class Test {
public static void main(String[] args){
String driver_home = "C:\\path\\to\\Chromedriver.exe";
// define the necessary chrome options
ChromeOptions chrome_options = new ChromeOptions();
chrome_options.addArguments("--window-size=1920,1080");
chrome_options.addArguments("--headless=new");
// create ChromeDriverBuilder instance and build the standard chromedriver
ChromeDriver chromeDriver1 = new ChromeDriverBuilder()
.build(chrome_options,driver_home);
// navigate to target website
chromeDriver1.get("https://www.scrapingcourse.com/antibot-challenge");
try {
// wait for 3 seconds (3 milliseconds)
Thread.sleep(3000);
// take a screenshot of the page
File scrFile = ((TakesScreenshot) chromeDriver1).getScreenshotAs(OutputType.FILE);
// save the screenshot to the desired location
FileUtils.copyFile(scrFile, new File("screenshot.png"));
System.out.println("Screenshot saved as screenshot.png");
} catch (IOException | InterruptedException e) {
e.printStackTrace();
} finally {
// close the browser
chromeDriver1.quit();
}
}
}
Run it, and you'll get the following screenshot.png
file:
This means Undetected ChromeDriver is stuck on the CAPTCHA page.
While Undetected ChromeDriver was once a go-to solution for web scraping, it has fallen behind in the arms race against anti-bot systems. Luckily, there are solutions for bypassing CAPTCHAs and anti-bots, such as a web scraping API.
Method #2: Bypass CAPTCHA With a Web Scraping API
We've seen that free and open-source solutions are often insufficient, as websites using sophisticated anti-bot systems can easily detect and block them.
Fortunately, web scraping APIs, like ZenRows, offer the ultimate solution for bypassing all CAPTCHAs.
With features such as auto-rotating user agents, premium proxies, geolocation, and more, ZenRows provides the complete toolkit for scraping any website without getting blocked.
ZenRows also offers headless browser functionality, which means it can replace Selenium altogether. It allows you to interact with page elements and emulate natural browsing behavior without additional overhead.
Below is a step-by-step guide on how to use ZenRows against the Antibot Challenge page where Undetected ChromeDriver failed.
Sign up for free, and you'll be redirected to the Request Builder page:
Input the target URL (https://www.scrapingcourse.com/antibot-challenge
) and activate Premium Proxies and the JS rendering mode.
That'll generate your request code on the right. Copy it, and use your preferred HTTP client. This example uses the Fluent API of the Apache HTTP library, which you can add to your Maven project by including the following XML snippet in your pom.xml
file.
<!-- https://mvnrepository.com/artifact/org.apache.httpcomponents.client5/httpclient5-fluent -->
<dependency>
<groupId>org.apache.httpcomponents.client5</groupId>
<artifactId>httpclient5-fluent</artifactId>
<version>5.3.1</version>
</dependency>
Your code should be similar to this:
import org.apache.hc.client5.http.fluent.Request;
public class APIRequest {
public static void main(final String... args) throws Exception {
String apiUrl = "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.scrapingcourse.com%2Fantibot-challenge&js_render=true&premium_proxy=true";
String response = Request.get(apiUrl)
.execute().returnContent().asString();
System.out.println(response);
}
}
Run it, and you'll get the HTML content of your target web page:
<html lang="en">
<head>
<!-- ... -->
<title>Antibot Challenge - ScrapingCourse.com</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
<h2>
You bypassed the Antibot challenge! :D
</h2>
<!-- other content omitted for brevity -->
</body>
</html>
Good job!
Conclusion
To bypass CAPTCHAs using Selenium, you must completely emulate natural user behavior. While Undetected ChromeDriver can patch most of Selenium's automation properties, advanced anti-bot systems can still detect it.
For guaranteed results, you need ZenRows, a web scraping API that provides an all-in-one solution for bypassing all CAPTCHAs and scraping any website. Try ZenRows for free.