Are you getting blocked while web scraping in Java? One key factor contributing to this issue is the User-Agent header. The website may block your requests if your Java HttpClient user agent identifies you as a bot.
In this guide, you'll learn how to set a custom Java User Agent header in HttpClient to avoid detection. Let's dive in.
What Is the HttpClient User Agent?
HTTP headers convey essential information between the web client and the target server. The user agent is the most critical component, as it divulges details about the client making the requests.
A typical User-Agent (UA) string consists of various components, including the browser name, version, operating system, and sometimes additional details like device type. For instance, below is a Google Chrome UA string.
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36`
It tells the web server that the request comes from a Chrome browser with version 92.0.4515.159
, running on Windows 10, among other details.
On the other hand, your Java HttpClient user agent informs the server that you're not requesting from an actual browser, as it typically looks like this.
Java-http-client/17.0.10
You can see yours by making a basic request to httpbin.io/user-agent
.
From the examples of User-Agent strings above, it's clear how easily websites can differentiate between Java HttpClient requests and those from an actual browser. This is why setting a custom user agent is essential to avoid detection.
How to Set a Custom Java User Agent in HttpClient
Follow the steps below to set up a custom Java HttpClient user agent.
1. Getting Started
Create your Java project to kickstart your journey to a custom HttpClient user agent. The HttpClient class is part of the standard Java Development Kit (JDK), so you don't need to install anything separately or include external dependencies.
Once you have everything set up, you're ready to write your code. Below is a basic script that makes a GET request to https://httpbin.io/user-agent
and retrieves its text content.
package com.example;
// import the required classes
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.http.HttpResponse.BodyHandlers;
public class Main {
public static void main(String[] args) {
// create an instance of HttpClient
HttpClient client = HttpClient.newHttpClient();
// build request using the Request Builder
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://httpbin.io/user-agent"))
.build();
// send request asynchronously and print response to the console
client.sendAsync(request, BodyHandlers.ofString())
.thenApply(HttpResponse::body)
.thenAccept(System.out::println)
.join();
}
}
2. Customize UA
The HttpRequest.Builder
class allows you to set HTTP headers using the header()
method. This method takes two parameters: the header's name and value. In this case, you would pass "User-Agent" as the header name and your desired UA string as the header value, as in the code snippet below.
// create an HttpRequest instance
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://httpbin.io/user-agent"))
// set custom user agent header
.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36") // set the User-Agent header
.build(); // build request
This code changes the default user agent to the sample Google Chrome UA you saw earlier.
Now, set a custom User Agent in the HTTP request script in step 1, and you'll have the following complete code.
package com.example;
// import the required classes
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.http.HttpResponse.BodyHandlers;
public class Main {
public static void main(String[] args) {
// create an instance of HttpClient
HttpClient client = HttpClient.newHttpClient();
// build request using the Request Builder
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://httpbin.io/user-agent"))
.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36") // set the User-Agent header
.build();
// send request asynchronously and print response to the console
client.sendAsync(request, BodyHandlers.ofString())
.thenApply(HttpResponse::body)
.thenAccept(System.out::println)
.join();
}
}
Run the code, and your response should be the Google Chrome UA.
{
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
}
Congrats, you've changed your Java user agent in HttpClient to that of a Chrome browser.
However, with a single user agent, websites can eventually identify your scraper and block you accordingly. But rotating user agents can help you avoid detection in some scenarios.
3. Use a Random User Agent with HttpClient
Websites may detect patterns in multiple requests from the same user agent and interpret them as automated traffic. You can overcome this issue by varying the User-Agent header per request.
To rotate User-Agent headers, maintain a list of different User-Agent strings and randomly select one for each HTTP request.
Here's how you can modify your previous code to achieve this.
Start by creating an array or list containing all the user agent strings you want to use. Ensure you import the required Java classes (List
and Random
). For this example, we've selected a few UAs from this list of web-scraping user agents.
package com.example;
// import the required classes
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.http.HttpResponse.BodyHandlers;
import java.util.List;
import java.util.Random;
public class Main {
public static void main(String[] args) {
// define a list of User-Agent strings
List<String> userAgents = List.of(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
// Add more User-Agent strings as needed
);
}
}
After that, randomly select a User-Agent string from the list. To do this, create a random number generator and use it to choose a UA string from the list.
public class Main {
public static void main(String[] args) {
//...
// randomly select UA from the list
Random random = new Random();
String randomUserAgent = userAgents.get(random.nextInt(userAgents.size()));
}
}
Lastly, set the selected User-Agent string as the value of the "User-Agent" header in the HTTP request and send the request.
public class Main {
public static void main(String[] args) {
//...
// create an instance of HttpClient
HttpClient client = HttpClient.newHttpClient();
// build an HTTP request with a randomly selected User-Agent header
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://httpbin.io/user-agent"))
.header("User-Agent", randomUserAgent) // set a random User-Agent header
.build();
// send request asynchronously and print response to the console
client.sendAsync(request, BodyHandlers.ofString())
.thenApply(HttpResponse::body)
.thenAccept(System.out::println)
.join();
}
}
Putting everything together, you'll have the following complete code.
package com.example;
// import the required classes
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.http.HttpResponse.BodyHandlers;
import java.util.List;
import java.util.Random;
public class Main {
public static void main(String[] args) {
// define a list of User-Agent strings
List<String> userAgents = List.of(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
// add more User-Agent strings as needed
);
// randomly select UA from the list
Random random = new Random();
String randomUserAgent = userAgents.get(random.nextInt(userAgents.size()));
// create an instance of HttpClient
HttpClient client = HttpClient.newHttpClient();
// build an HTTP request with a randomly selected User-Agent header
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://httpbin.io/user-agent"))
.header("User-Agent", randomUserAgent) // Set a random User-Agent header
.build();
// send request asynchronously and print response to the console
client.sendAsync(request, BodyHandlers.ofString())
.thenApply(HttpResponse::body)
.thenAccept(System.out::println)
.join();
}
}
Every time you run the script, a different UA will be used to make your request. For example, here are our results for three requests:
{
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
}
{
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
}
{
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
}
Bingo! You've successfully rotated HttpClient user agents.
You'll need to expand your list in real-world use cases, so paying attention to your UA construction is essential. A properly constructed UA can help ensure smooth communication between the client and the server, while an incorrectly formatted or suspicious UA might trigger anti-bot measures.
For example, if the User Agent suggests a specific browser version that doesn't exist or is outdated, websites can easily detect the discrepancy and block your scraper.
Also, your UA string must match other HTTP headers. If the User-Agent string identifies the client as a particular browser and version, but other HTTP headers suggest different characteristics or behaviors, it could signal irregularities in the request.
Maintaining a diverse, well-formed, and up-to-date pool of User Agents can be challenging. The following section provides an easier solution.
Avoid Getting Blocked With HttpClient in Java
Creating a reliable User Agent rotation system is more complex than managing a list. You need to keep updating browser versions regularly, make sure they match with operating systems correctly and remove outdated combinations.
Also, websites look beyond User Agents to detect bots. They analyze your request patterns, header consistency, connection details, and more. Even with perfect User Agent rotation in HttpClient, your requests might still get blocked.
The most effective solution is to use a web scraping API like ZenRows. It provides auto-rotating, up-to-date User Agents, premium proxy, JavaScript rendering, CAPTCHA auto-bypass, and everything you need to avoid getting blocked.
Let's see how ZenRows performs against a protected page like the Antibot Challenge page.
Start by signing up for a new account, and you'll get to the Request Builder.

Paste the target URL, enable JS Rendering, and activate Premium Proxies.
Next, select Java and click on the API connection mode. Then, copy the generated code and paste it into your script.
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
public class Main {
public static void main(String[] args) {
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https://www.scrapingcourse.com/antibot-challenge&js_render=true&premium_proxy=true"))
.build();
client.sendAsync(request, HttpResponse.BodyHandlers.ofString())
.thenApply(HttpResponse::body)
.thenAccept(System.out::println)
.join();
}
}
Make sure you have the required HTTP client library:
<dependency>
<groupId>org.apache.httpcomponents.client5</groupId>
<artifactId>httpclient5</artifactId>
<version>5.2.1</version>
</dependency>
Run the code, and you'll successfully access the page:
<html lang="en">
<head>
<!-- ... -->
<title>Antibot Challenge - ScrapingCourse.com</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
<h2>
You bypassed the Antibot challenge! :D
</h2>
<!-- other content omitted for brevity -->
</body>
</html>
Congratulations! 🎉 You’ve successfully bypassed the anti-bot challenge page using ZenRows. This works for any website.
Conclusion
This guide has shown you key points about User Agents in Java's HttpClient:
- What User Agents are and how they work.
- How to set custom User Agents in your requests.
- Ways to rotate between different User Agents.
- Why User Agent management alone isn't enough.
Remember, many websites use different anti-bot mechanisms to prevent web scraping. Integrate ZenRows to make sure you extract all the data you need without getting blocked. Try ZenRows for free!