If you're looking to avoid detection when scraping with Selenium, you're in the right place. Websites often deploy advanced mechanisms that can easily recognize traditional WebDriver automation tools and block your web scraper.
In this tutorial, you'll learn how to overcome this challenge using Undetected Chromedriver in Java. As a bonus, we'll also explore when Undetected Chromedriver isn't enough and how you can still achieve your desired results.
Why Use Undetected ChromeDriver in Java?
If you've ever attempted scraping using Selenium, you know how frustratingly easy it is for websites to flag and block your scraper. This is because the standard Chromedriver leaks inherent properties easily identifiable by anti-bot systems.
That is where Undetected Chromedriver plays a key role. It fortifies the standard Chromedriver by patching automation properties. This makes it harder for websites to detect and enables you to scrape without triggering anti-bot mechanisms.
While this tool is typically a Python module, you can adapt it in Java. Let's see how.
How to use ChromeDriver in Java
In this tutorial, we'll guide you through integrating Undetected Chromedriver into your Java web scraper using one of the most used custom Selenium Chromedriver in Java. For illustration purposes, we'll scrape NowSecure, a Cloudflare-protected test website.
Prerequisites
Before we dive in, ensure you have the latest version of JDK installed for your operating system and create your Java project.
This Java library doesn't recommend importing Selenium files using Maven, especially when dealing with Selenium 4 and above. The reason is that Maven might inadvertently import some Selenium package JARs from version 3, which can lead to compatibility issues.
Therefore, import Selenium Webdriver Jar files manually. To do this, download the Jar file, extract and copy the lib
folder to your project's root directory.
After that, download the Undetected Chromedriver Java library, unzip the folder, and copy its src
folder and pom.xml
file into the same directory.
Next, navigate to your project directory, run the following command to build your project, and prepare to write your code.
mvn clean install
Tutorial
Now, in your app.java
file, import the following dependencies.
import com.frogking.chromedriver.ChromeDriverBuilder;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeDriverService;
import org.openqa.selenium.chrome.ChromeOptions;
import java.io.File;
public class App {
public static void main(String[] args){
//..
}
}
com.frogking.chromedriver.ChromeDriverBuilder
adds the ChromeDriverBuilder class, a custom class provided by the Java library. It's used for building and configuring instances of the ChromeDriver.
org.openqa.selenium.chrome.ChromeDriver
imports the ChromeDriver class from the org.openqa.selenium.chrome package. The ChromeDriver class is part of the Selenium WebDriver library and represents the Chrome browser driver for WebDriver. It lets you interact with and control the Chrome browser through your Java code.
org.openqa.selenium.chrome.ChromeDriverService
brings the ChromeDriverService class. This class provides an interface to control the ChromeDriver's service, allowing you to start and stop the ChromeDriver executable.
org.openqa.selenium.chrome.ChromeOptions
calls the ChromeOptions class for configuring ChromeDriver options like window size, headless mode, etc.
Then, set the path to your ChromeDriver executable, create a new ChromeOptions instance, and define the necessary driver options.
public class App {
public static void main(String[] args){
String driver_home = "your driver home";
ChromeOptions chrome_options = new ChromeOptions();
chrome_options.addArguments("--window-size=1920,1080");
chrome_options.addArguments("--headless=new"); // when chromedriver > 108.x.x.x
//chrome_options.addArguments("--headless=chrome"); when chromedriver <= 108.x.x.x
//..
}
}
The code snippet above sets the window size to 1920x1080, and the headless mode.
Next, create a ChromeDriver service and initialize a ChromeDriver based on the predefined Chrome options and executable path.
public class App {
public static void main(String[] args){
//..
ChromeDriverService service = new ChromeDriverService.Builder()
.usingDriverExecutable(new File(driver_home))
.usingAnyFreePort()
.build();
ChromeDriver chromeDriver1 = new ChromeDriverBuilder()
.build(chrome_options,driver_home);
}
}
.usingDriverExecutable(new File(driver_home)
specifies the path to the ChromeDriver executable and .usingAnyFreePort()
allows ChromeDriver to use any available free port.
Lastly, navigate to the target website (https://nowsecure.nl)and retrieve your desired data.
Putting everything together, you should have the following complete code.
import com.frogking.chromedriver.ChromeDriverBuilder;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeDriverService;
import org.openqa.selenium.chrome.ChromeOptions;
import java.io.File;
public class App {
public static void main(String[] args){
String driver_home = "your driver home";
ChromeOptions chrome_options = new ChromeOptions();
chrome_options.addArguments("--window-size=1920,1080");
//chrome_options.addArguments("--headless=new"); when chromedriver > 108.x.x.x
//chrome_options.addArguments("--headless=chrome"); when chromedriver <= 108.x.x.x
ChromeDriverService service = new ChromeDriverService.Builder()
.usingDriverExecutable(new File(driver_home))
.usingAnyFreePort()
.build();
//ChromeDriver chromeDriver1 = new ChromeDriver(service);
ChromeDriver chromeDriver1 = new ChromeDriverBuilder()
.build(chrome_options,driver_home);
chromeDriver1.get("https://nowsecure.nl");
}
}
Right-click on your Java file and select Run Java
to execute your code, and you'll get the result below.
Congrats, you've bypassed anti-bot protection using Undetected Chromedriver in Java.
When Undetected ChromeDriver Isn't Enough
While Undetected ChromeDriver is a powerful tool, you can increase your chances of avoiding detection using additional customizations like rotating Selenium proxies with Java, randomizing user agents, CAPTCHA solvers, etc. These techniques take you further in the quest to mimic an actual browser.
However, there are scenarios where Undetected Chromedriver doesn't cut it, even with the abovementioned techniques. This is typically common with websites using advanced anti-bot systems. Also, Undetected Chromedriver isn't a great option for large-scale projects as it is resource-intensive and time-consuming.
As a demonstration, below is the result of trying to scrape an advanced antibot-protected web page (G2 product review page) using Undetected ChromeDriver in Java.
We get stuck in a Cloudflare challenge page, which proves that Undetected Chromedriver cannot bypass advanced anti-bot systems.
Do not worry, though. ZenRows offers an alternative solution. This web scraping API provides everything you need to bypass any anti-bot system, regardless of complexity.
Like Selenium, it offers headless browser functionality but without the additional overhead. This means you can render JavaScript and mimic human behavior with only a single API call.
Let's see ZenRows in action against the G2 product page where Undetected Chromedriver failed.
To get started, sign up for free, and you'll get to the Request Builder page.
Input the target URL (https://www.g2.com/products/asana/reviews), activate the "JS Rendering" mode, and use "Premium Proxies". Also, select Java and the API options.
That'll generate your request code on the right. Copy it to your editor. It should look like this.
import org.apache.hc.client5.http.fluent.Request;
public class APIRequest {
public static void main(final String... args) throws Exception {
String apiUrl = "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fasana%2Freviews&js_render=true&premium_proxy=true";
String response = Request.get(apiUrl)
.execute().returnContent().asString();
System.out.println(response);
}
}
Run it, and you'll get the HTML content of the page.
<!DOCTYPE html>
#...
<title>Asana Reviews 2023: Details, Pricing, & Features | G2</title>
#...
Awesome right? That's how easy it is to scrape with ZenRows.
Conclusion
With many websites implementing anti-bot mechanisms, Undetected Chromedriver in Java is a useful addition to any web scraping arsenal. However, it isn't foolproof and doesn't work against advanced anti-bot protection.
Luckily, ZenRows, a web scraping API that enables you to scrape without getting blocked, offers an excellent alternative to Selenium and Undetected Chromedriver.