Puppeteer is a NodeJS library that lets you control headless Chrome or Chromium through the DevTools protocol. While it actually belongs to NodeJS, you still can use it in Java as long as you find a working Puppeteer Java wrapper.
In this tutorial, you'll learn exactly how to scrape data using Puppeteer and Java.
- Web scraping using Puppeteer in Java.
- Interacting with a web page using Puppeteer in Java.
- Avoid getting blocked while scraping the web using Puppeteer.
How to Use Puppeteer in Java for Web Scraping
Follow the steps below to learn how to use Puppeteer in Java.
Step 1: Prerequisites
Before we dive into the code, make sure you meet the following requirements:
- Java Development Kit (JDK) 11 or newer.
- Jvppeteer (Puppeteer Java wrapper)
- Your preferred IDE. We'll be using Visual Studio Code in this tutorial.
To start with, create a Java project and include the Jvppeteer dependency. If you're using Maven, add this XML snippet in the dependencies
section of your pom.xml
file.
<dependency>
<groupId>io.github.fanyong920</groupId>
<artifactId>jvppeteer</artifactId>
<version>3.3.2</version>
</dependency>
Check the Jvppeteer GitHub repository to ensure you use the latest version.
Step 2: Building a Basic Web Scraper in Puppeteer with Java
There is currently no official Java support for Puppeteer. However, third-party community-driven wrappers, such as Jvppeteer, allow you to use Puppeteer in Java.
Jvppeteer is an open-source Puppeteer port that provides a Java interface for controlling Chrome and Firefox using the DevTools and WebDriver-bidi.
This means you can access Puppeteer functionalities, including automating browser interactions, rendering JavaScript, and simulating human browsing in your Java web scraper.
Note that Jvppeteer doesn't implement all Puppeteer features, and unsupported functions will throw an Unsupported Operation Exception error.
However, it's still a good option for simple web scraping operations.
For this tutorial, we'll use the ScrapingCourse E-commerce test site as the target page.

Here's a simple Puppeteer Java scraper using Jvppeteer.
It starts a Chrome browser, opens the target webpage, and downloads the HTML content.
package com.example;
// import the required classes
import com.ruiyun.jvppeteer.api.core.Browser;
import com.ruiyun.jvppeteer.api.core.Page;
import com.ruiyun.jvppeteer.cdp.core.Puppeteer;
import com.ruiyun.jvppeteer.cdp.entities.LaunchOptions;
public class Main {
public static void main(String[] args) {
System.out.println("Launching browser...");
// initialize launch options
LaunchOptions launchOptions = LaunchOptions.builder()
// run in GUI mode
//.headless(false)
.build();
try (Browser cdpBrowser = Puppeteer.launch(launchOptions)) {
// open a new page
Page page = cdpBrowser.newPage();
// navigate to the target URL
page.goTo("https://www.scrapingcourse.com/ecommerce/");
// retrieve the page's HTML content
String pageContent = page.content();
System.out.println(pageContent);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Jvppeteer runs in headless mode by default. You'll need to set the headless launch option to false
to launch the browser's GUI.
Here's the result:
<html lang="en">
<head>
<!-- ... -->
<title>
Ecommerce Test Site to Learn Web Scraping - ScrapingCourse.com
</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
<div class="beta site-title">
<a href="https://www.scrapingcourse.com/ecommerce/" rel="home">
Ecommerce Test Site to Learn Web Scraping
</a>
</div>
<!-- other content omitted for brevity -->
</body>
</html>
Remember, websites can easily flag and block CDP-based tools due to their automation properties.
If you're getting blocked, consider the ZenRows Scraping Browser to bypass the website's bot detection. We'll explore this solution in more detail in a subsequent section.
Step 3: Parse Data from the Page
Parsing data from the downloaded HTML involves instructing Puppeteer to go through the DOM, locate elements, and extract their text content.
For this purpose, Puppeteer provides two main methods for traversing the HTML document: XPath and CSS selectors.
Check out our XPath vs CSS selectors comparison guide for a detailed comparison of the two methods.
That said, we recommend CSS selectors as they're more intuitive and beginner-friendly.
In this example, we'll extract each product's name, price, and image URL.
Let's begin!
Firstly, inspect the page and identify the CSS selectors that correspond to all the data points you want. Visit the page in a browser, right-click on the first product, and choose the Inspect option.
This opens up the Developer Tools window, as shown below.

Now, notice that each product is a list item with class product
, and they contain the following data points:
- Product name:
<h2>
with classproduct-name
. - Product price:
span
element with classproduct-price
. - Product image:
<img>
tag with classproduct-image
.
Using this information, select all product items on the page, iterate through them, and get the product name, price, and image URL.
Although you can do this with Puppeteer, we recommend integrating with JSoup, a Java parsing library, as it's more intuitive and offers simpler syntax.
To add JSoup to your project, include the following XML snippet in your pom.xml
<dependencies>
section,
<dependency>
<!-- jsoup HTML parser library @ https://jsoup.org/ -->
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.18.3</version>
</dependency>
After that, import the required classes and parse the downloaded HTML using JSoup.
package com.example;
// import the required classes
import com.ruiyun.jvppeteer.api.core.Browser;
import com.ruiyun.jvppeteer.api.core.Page;
import com.ruiyun.jvppeteer.cdp.core.Puppeteer;
import com.ruiyun.jvppeteer.cdp.entities.LaunchOptions;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
public static void main(String[] args) {
System.out.println("Launching browser...");
// initialize launch options
LaunchOptions launchOptions = LaunchOptions.builder()
// run in GUI mode
//.headless(false)
.build();
try (Browser cdpBrowser = Puppeteer.launch(launchOptions)) {
// open a new page
Page page = cdpBrowser.newPage();
// navigate to the target URL
page.goTo("https://www.scrapingcourse.com/ecommerce/");
// retrieve the page's HTML content
String pageContent = page.content();
// parse HTML using JSoup
Document document = Jsoup.parse(pageContent);
// select product items
Elements products = document.select("li.product");
// iterate through products and extract name, price, and image
for (Element product : products) {
String name = product.select(".product-name").text();
String price = product.select(".product-price").text();
String image = product.select(".product-image").attr("src");
System.out.println("Product Name: " + name);
System.out.println("Price: " + price);
System.out.println("Image URL: " + image);
System.out.println("----------------------");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
That's it!
This code extracts each product's name, price, and image URL. Here's the result:
Product Name: Abominable Hoodie
Price: $69.00
Image URL: https://www.scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh09-blue_main.jpg
----------------------
Product Name: Adrienne Trek Jacket
Price: $57.00
Image URL: https://www.scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/wj08-gray_main.jpg
----------------------
// ... omitted for brevity
Step 4: Export Scraped Data to a CSV File
Exporting data to CSV is essential for easy access and analysis. In Java, you can do so using the built-in FileWriter
class.
Here's a step-by-step guide.
Import the required classes. Then, initialize an empty list and add the extracted data to this list.
package com.example;
// import the required classes
// ...
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class Main {
// initialize an empty list to store scraped product data
private static List<String[]> productData = new ArrayList<>();
public static void main(String[] args) {
// ...
try (Browser cdpBrowser = Puppeteer.launch(launchOptions)) {
// ...
// ...scraping logic
// store the product details in the data list
productData.add(new String[]{name, price, image});
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
After that, initialize a FileWriter
class. Then, write the CSV headers and populate the rows with the scraped data. Let's abstract this into a reusable method.
package com.example;
// import the required classes
// ...
public class Main {
//...
public static void main(String[] args) {
//...
private static void exportDataToCsv(String filePath) {
try (FileWriter writer = new FileWriter(filePath)) {
// write headers
writer.append("Product Name,Price,Image URL\n");
// write data rows
for (String[] row : productData) {
writer.append(String.join(",", row));
writer.append("\n");
}
System.out.println("Data saved to " + filePath);
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
That's it!
Now, combine all the steps above and call the exportDataToCSV()
method in main()
to get the following complete code.
package com.example;
// import the required classes
import com.ruiyun.jvppeteer.api.core.Browser;
import com.ruiyun.jvppeteer.api.core.Page;
import com.ruiyun.jvppeteer.cdp.core.Puppeteer;
import com.ruiyun.jvppeteer.cdp.entities.LaunchOptions;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class Main {
// initialize an empty list to store scraped product data
private static List<String[]> productData = new ArrayList<>();
public static void main(String[] args) {
System.out.println("Launching browser...");
// initialize launch options
LaunchOptions launchOptions = LaunchOptions.builder()
// run in GUI mode
//.headless(false)
.build();
try (Browser cdpBrowser = Puppeteer.launch(launchOptions)) {
// open a new page
Page page = cdpBrowser.newPage();
// navigate to the target URL
page.goTo("https://www.scrapingcourse.com/ecommerce/");
// retrieve the page's HTML content
String pageContent = page.content();
// parse HTML using JSoup
Document document = Jsoup.parse(pageContent);
// select product items
Elements products = document.select("li.product");
// iterate through products and extract name, price, and image
for (Element product : products) {
String name = product.select(".product-name").text();
String price = product.select(".product-price").text();
String image = product.select(".product-image").attr("src");
// store the product details in the data list
productData.add(new String[]{name, price, image});
}
// export data to CSV
exportDataToCsv("products.csv");
} catch (Exception e) {
e.printStackTrace();
}
}
// method to export data to CSV file
private static void exportDataToCsv(String filePath) {
try (FileWriter writer = new FileWriter(filePath)) {
// write headers
writer.append("Product Name,Price,Image URL\n");
// write data rows
for (String[] row : productData) {
writer.append(String.join(",", row));
writer.append("\n");
}
System.out.println("Data saved to " + filePath);
} catch (IOException e) {
e.printStackTrace();
}
}
}
This code creates a new products.csv
file in your project's root directory and exports the scraped data to it.
Here's a sample screenshot for reference.

Congratulations! You've created a functional Puppeteer Java web scraper.
Interact With the Page Using Puppeteer in Java
Some web scraping scenarios require you to simulate user actions. In this section, you'll learn two popular browser interactions using Puppeteer in Java.
Handle Infinite Scrolling
Modern websites use infinite scrolling to update their data as you scroll down the page. This means that the entire web page content isn't loaded at once, and you must simulate the scrolling action to access the full HTML.
To scrape such pages using Puppeteer in Java, you must scroll down the page gradually until you reach the bottom, where no more content loads.
Let's put this into practice.
We'll use the following ScrapingCourse Infinite Scrolling test page as the target website for this example.

To get started, launch a Chrome browser and navigate to the target website as in the previous steps.
package com.example;
// import the required classes
import com.ruiyun.jvppeteer.api.core.Browser;
import com.ruiyun.jvppeteer.api.core.Page;
import com.ruiyun.jvppeteer.cdp.core.Puppeteer;
import com.ruiyun.jvppeteer.cdp.entities.LaunchOptions;
public class Main {
public static void main(String[] args) {
System.out.println("Launching browser...");
// initialize launch options
LaunchOptions launchOptions = LaunchOptions.builder()
// run in GUI mode
//.headless(false)
.build();
try (Browser cdpBrowser = Puppeteer.launch(launchOptions)) {
// open a new page
Page page = cdpBrowser.newPage();
// navigate to the target URL
page.goTo("https://www.scrapingcourse.com/infinite-scrolling");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Next is simulating the scrolling action.
To do so, get the page's initial scroll height. Then, using a loop, scroll to the bottom of the page and wait a few seconds for new content to load. Break the loop if that's the actual bottom (no more new content); otherwise, continue scrolling.
public class Main {
public static void main(String[] args) {
//...
try (Browser cdpBrowser = Puppeteer.launch(launchOptions)) {
//...
long lastHeight = ((Number) page.evaluate("() => document.body.scrollHeight")).longValue();
while (true) {
// scroll down
page.evaluate("window.scrollTo(0, document.body.scrollHeight)");
// wait for content to load
Thread.sleep(3000);
// get new scroll height
long newHeight = ((Number) page.evaluate("() => document.body.scrollHeight")).longValue();
if (newHeight == lastHeight) {
// stop scrolling if there is no more new content
break;
}
lastHeight = newHeight;
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
That's it. You've simulated the scrolling action using Puppeteer in Java.
Add your scraping logic to extract product data and export it to CSV, as in the previous steps. Then, combine the code snippets above to get the following complete code.
package com.example;
// import the required classes
import com.ruiyun.jvppeteer.api.core.Browser;
import com.ruiyun.jvppeteer.api.core.Page;
import com.ruiyun.jvppeteer.cdp.core.Puppeteer;
import com.ruiyun.jvppeteer.cdp.entities.LaunchOptions;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class Main {
// initialize an empty list to store scraped product data
private static List<String[]> productData = new ArrayList<>();
public static void main(String[] args) {
System.out.println("Launching browser...");
// initialize launch options
LaunchOptions launchOptions = LaunchOptions.builder()
// run in GUI mode
//.headless(false)
.build();
try (Browser cdpBrowser = Puppeteer.launch(launchOptions)) {
// open a new page
Page page = cdpBrowser.newPage();
// navigate to the target URL
page.goTo("https://www.scrapingcourse.com/infinite-scrolling");
long lastHeight = ((Number) page.evaluate("() => document.body.scrollHeight")).longValue();
while (true) {
// scroll down
page.evaluate("window.scrollTo(0, document.body.scrollHeight)");
// wait for content to load
Thread.sleep(3000);
// get new scroll height
long newHeight = ((Number) page.evaluate("() => document.body.scrollHeight")).longValue();
if (newHeight == lastHeight) {
// stop scrolling if there is no more new content
break;
}
lastHeight = newHeight;
}
// retrieve the page's HTML content
String pageContent = page.content();
// parse HTML using JSoup
Document document = Jsoup.parse(pageContent);
// select product items
Elements products = document.select(".product-item");
// iterate through products and extract name, price, and image
for (Element product : products) {
String name = product.select(".product-name").text();
String price = product.select(".product-price").text();
String image = product.select(".product-image").attr("src");
// store the product details in the data list
productData.add(new String[]{name, price, image});
}
// export data to CSV
exportDataToCsv("products.csv");
} catch (Exception e) {
e.printStackTrace();
}
}
// method to export data to CSV file
private static void exportDataToCsv(String filePath) {
try (FileWriter writer = new FileWriter(filePath)) {
// write headers
writer.append("Product Name,Price,Image URL\n");
// write data rows
for (String[] row : productData) {
writer.append(String.join(",", row));
writer.append("\n");
}
System.out.println("Data saved to " + filePath);
} catch (IOException e) {
e.printStackTrace();
}
}
}
This code navigates to the new target page, scrolls to the bottom, extracts product data, and exports to CSV.
Here's a sample screenshot for reference.

Congratulations!
Take Screenshots
Puppeteer allows you to capture screenshots of different specifications:
- Full page: The entire web page, including parts that may require scrolling.
- Visible area: Only what's visible in the browser window.
- Specific element: A specific HTML element, such as a product card.
To capture a full-page screenshot, set the setFullPage
screenshot option to true
.
package com.example;
// import the required classes
import com.ruiyun.jvppeteer.api.core.Browser;
import com.ruiyun.jvppeteer.api.core.Page;
import com.ruiyun.jvppeteer.cdp.core.Puppeteer;
import com.ruiyun.jvppeteer.cdp.entities.ScreenshotOptions;
import com.ruiyun.jvppeteer.cdp.entities.LaunchOptions;
public class Main {
public static void main(String[] args) {
System.out.println("Launching browser...");
// initialize launch options
LaunchOptions launchOptions = LaunchOptions.builder()
// run in headless mode
.headless(true)
.build();
try (Browser cdpBrowser = Puppeteer.launch(launchOptions)) {
// open a new page
Page page = cdpBrowser.newPage();
// navigate to the target URL
page.goTo("https://www.scrapingcourse.com/ecommerce/");
// configure screenshot options
ScreenshotOptions screenshotOptions = new ScreenshotOptions();
screenshotOptions.setPath("full_page.png");
screenshotOptions.setOmitBackground(true);
screenshotOptions.setFullPage(true);
// take a screenshot
page.screenshot(screenshotOptions);
System.out.println("Screenshot taken and saved");
} catch (Exception e) {
e.printStackTrace();
}
}
}
For a visible area screenshot, simply omit the setFullPage
option in your screenshot options.
public class Main {
public static void main(String[] args) {
//...
try (Browser cdpBrowser = Puppeteer.launch(launchOptions)) {
//...
// configure screenshot options
ScreenshotOptions screenshotOptions = new ScreenshotOptions();
screenshotOptions.setPath("visible_area.png");
screenshotOptions.setOmitBackground(true);
// take a screenshot
page.screenshot(screenshotOptions);
System.out.println("Screenshot taken and saved");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Lastly, to take a screenshot of a specific element, select the element using a CSS selector and use the screenshot()
method on the element.
Suppose you're interested in the first product on the target page. You'll need to select the element with the ElementHandle
class and take its screenshot. Your script will look like this:
package com.example;
import com.ruiyun.jvppeteer.api.core.Browser;
import com.ruiyun.jvppeteer.api.core.Page;
import com.ruiyun.jvppeteer.cdp.core.Puppeteer;
import com.ruiyun.jvppeteer.cdp.entities.LaunchOptions;
import com.ruiyun.jvppeteer.api.core.ElementHandle;
public class Main {
public static void main(String[] args) {
System.out.println("Launching browser...");
// initialize launch options
LaunchOptions launchOptions = LaunchOptions.builder()
.headless(true)
.build();
try (Browser cdpBrowser = Puppeteer.launch(launchOptions)) {
// open a new page
Page page = cdpBrowser.newPage();
// navigate to the target URL
page.goTo("https://www.scrapingcourse.com/ecommerce/");
// wait until the element is available
page.waitForSelector(".product");
// get the element
ElementHandle product = (ElementHandle) page.evaluateHandle(
"document.querySelector('.product')"
);
// take a screenshot
product.screenshot("specific_element.png");
System.out.println("Screenshot taken and saved");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Note that the ElementHandle.screenshot()
method only takes a string (file path). Hence, screenshot options are absent in this case.
Avoid Getting Blocked While Scraping With Puppeteer
Getting blocked is a common challenge when web scraping with Puppeteer. This is because the headless browser often exhibits automation properties that are easily flagged by anti-bot solutions.
Here's a Puppeteer Java script that attempts to scrape an Antibot Challenge page.
package com.example;
// import the required classes
import com.ruiyun.jvppeteer.api.core.Browser;
import com.ruiyun.jvppeteer.api.core.Page;
import com.ruiyun.jvppeteer.cdp.core.Puppeteer;
import com.ruiyun.jvppeteer.cdp.entities.LaunchOptions;
public class Main {
public static void main(String[] args) {
System.out.println("Launching browser...");
// initialize launch options
LaunchOptions launchOptions = LaunchOptions.builder()
// run in GUI mode
.headless("true");
.build();
try (Browser cdpBrowser = Puppeteer.launch(launchOptions)) {
// open a new page
Page page = cdpBrowser.newPage();
// navigate to the target URL
page.goTo("https://www.scrapingcourse.com/antibot-challenge");
// retrieve the page's HTML content
String pageContent = page.content();
System.out.println(pageContent);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Here's the result:
<html lang="en">
<head>
<!-- ... -->
<title>Antibot Challenge - ScrapingCourse.com</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
<p>
Verifying you are human. This may take a few seconds.
</p>
<!-- other content omitted for brevity -->
</body>
</html>
This response signifies that the website flagged Puppeteer as a bot and blocked the request.
Common recommendations for overcoming this challenge include rotating proxies and setting custom User Agents. However, these measures do not work against advanced anti-bot solutions.
To avoid getting blocked while web scraping with Puppeteer, consider using ZenRows' Universal Scraper API. This tool is an all-in-one scraping solution that bypasses blocks and handles dynamic content extraction with its headless browser features.
A single API call is all you need to integrate the Universal Scraper API.
Here's a step-by-step guide on using the ZenRows Universal Scraper API to scrape the Antibot challenge page that blocked us earlier.
Sign up and go to the Request Builder. Then, paste the target URL in the link box and activate Premium Proxies and JS Rendering.

Select Java as your preferred programming language and choose the API connection mode. Copy the generated code and paste it into your scraper.
The generated code should look like this:
import org.apache.hc.client5.http.fluent.Request;
public class APIRequest {
public static void main(final String... args) throws Exception {
String apiUrl = "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.scrapingcourse.com%2Fantibot-challenge&js_render=true&premium_proxy=true";
String response = Request.get(apiUrl)
.execute().returnContent().asString();
System.out.println(response);
}
}
Here's the result, showing you bypassed the challenge:
<html lang="en">
<head>
<!-- ... -->
<title>Antibot Challenge - ScrapingCourse.com</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
<h2>
You bypassed the Antibot challenge! :D
</h2>
<!-- other content omitted for brevity -->
</body>
</html>
Congratulations! You can now bypass anti-bot restrictions and scrape at any scale using ZenRows' Universal Scraper API.
Conclusion
While Puppeteer is primarily a NodeJS library, you can leverage its functionality in Java using a Puppeteer Java wrapper (Jvppeteer).
However, since Puppeteer leaves traces of its automation properties, websites can easily flag your requests.
To avoid getting blocked when scraping with Puppeteer, use the ZenRows Universal Scraper API, a complete toolkit that lets you scrape any website confidently without limitations.