The Playwright vs. Puppeteer debate is a big discussion since both are fantastic Node.js libraries for browser automation. Although they do pretty much the same thing, Puppeteer and Playwright have some notable differences.
Let's run through a quick history here:
Let's run through a quick history here: The Chrome Dev team created Puppeteer in 2017 to cover Selenium's unreliability for browser automation. Microsoft later launched Playwright, which, like Puppeteer, can efficiently run complex automation tests on a browser. Yet this time around, they introduced more functionalities.
So which one is the best?
Let's look at Puppeteer and Playwright's differences to see what makes each library unique.
Playwright vs. Puppeteer: What Are the Major Differences?
Puppeteer and Playwright are headless browsers originally designed for end-to-end automated testing of web apps. They're used for other purposes as well, such as web scraping.ย
Although they have similar use cases, some key differences between the automation tools are:
- Playwright supports Python, Java, JavaScript, TypeScript, and C#, while Puppeteer supports only JavaScript and a non-official port (pyppeteer) for Python.
- Playwright supports three browsers: Chromium, Firefox and WebKit, but Puppeteer supports only Chromium.
Playwright
Playwright is an end-to-end web testing and automation library. Although the framework's primary role is to test web applications, it can also be used for web scraping purposes.
What Are the Advantages of Playwright?
- Through a single API, the library lets you use Chromium, Firefox, or WebKit for testing. Besides that, the cross-platform framework runs smoothly on Windows, Linux, macOS, locally or on CI.
- Playwright supports Python, TypeScript, Java, JavaScript, and C#.
- Playwright runs faster than most testing frameworks like Cypress.
What Are the Disadvantages of Playwright?
- Playwright lacks support for Ruby, PHP, and Golang.
- Instead of real devices, Playwright uses desktop browsers to emulate mobile devices.
Playwright Browser Options
Browser options and page methods control the testing environment.
- Headless: This determines whether you see the browser during the testing. By default, the value is set to false. You can change it to true to see the browser during testing.
- SlowMo: The slow movement reduces the speed of switching between actions on the page. For example, a 500 value denotes delaying the action by 500 milliseconds.
- DevTools: You can open Chrome Dev Tools upon launching the target page. Note this option only works for Chromium.
await playwright.chromium.launch({ devtools: true })
Playwright Page Object Methods
Here are some methods to control the launch page.
Object Methods | Meaning |
---|---|
goto() | Visit a page for the first time. |
reload() | Refresh the page. |
evaluate() | Execute JavaScript code within the web page's context and return the result to your Node.js environment. Interact with the DOM. Alternatively, you can use $eval() , $$eval() , $() and $$() . |
screenshot() | Screenshot a page. |
setDefaultTimeout() | Make the headless browser wait for an action for a specified duration before throwing an error. |
keyboard.press() | Specify the key to press. |
waitForSelector() | Have the page delay action until a particular selector has been loaded. |
locator() | The locator class grabs elements using multiple selector combinations. |
click() | Specify the tag whose selector you wish to click. |
Web Scraping With Playwright
As a quick tutorial to back up the Playwright vs. Puppeteer debate, let's use Playwright to scrape the product titles, prices, and image URLs from an e-commerce pagination demo site, Scraping Course JS Rendering, and save the results in a CSV file.
Start by importing the Playwright and filesystem modules.
const { chromium } = require("playwright");
const fs = require("fs");
Since Playwright runs in an asynchronous environment and the async-await syntax only runs in an asynchronous function, you need to create an asynchronous function and code your scraping logic inside it.
const { chromium } = require("playwright");
const fs = require("fs");
(async () => {
// write your scraping logic here
})();
Let's write our scraping logic now!
Launch the Chromium browser and create a new context. Next, create a page
object using the browser API's newPage()
method.
// ...
(async () => {
// launch a Chromium browser
const browser = await chromium.launch();
const context = await browser.newContext();
// create a new page
const page = await context.newPage();
})();
To scrape the elements in your script, first, you need to define your CSS selector strategy.
Open the target URL in your browser and inspect to open DevTools. You'll notice that all the products are enclosed within a product-grid
div, and the product details (product-name
, product-price
, and product-image
) are inside individual product-item
divs.
We'll use this information in the next steps.
To scrape the target page's product details, navigate to the URL and wait for the product grid to load.
// ...
(async () => {
// ...
// navigate to the target web page
await page.goto("https://www.scrapingcourse.com/javascript-rendering", {
waitUntil: "networkidle",
});
// wait for the product grid to load
await page.waitForSelector("#product-grid .product-item", { timeout: 5000 });
})();
Using the CSS selectors, locate the div element containing products and store the resulting array of elements in the products
variable.
// ...
(async () => {
// ...
// extract product details
const products = await page.$$eval("#product-grid .product-item", (items) => {
return items.map((item) => {
let name = item.querySelector(".product-name").innerText.trim();
let price = item.querySelector(".product-price").innerText.trim();
let imageUrl = item.querySelector(".product-image").src;
return { name, price, imageUrl };
});
});
})();
Finally, store the extracted data in a CSV file and close the browser.
// ...
(async () => {
// ...
// specify the CSV headers
const headers = ["name", "price", "imageUrl"];
// add the headers to the CSV file
let csvData = headers.join(",") + "\n";
// create CSV-formatted strings
products.forEach((product) => {
const row = Object.values(product).join(",");
csvData += row + "\n";
});
// write the extracted data to a CSV
fs.writeFile("products.csv", csvData, (err) => {
if (err) {
console.error("Error writing CSV file:", err);
} else {
console.log("CSV file written successfully.");
}
});
// close the browser
await browser.close();
})();
Here's what the complete code looks like.
const { chromium } = require("playwright");
const fs = require("fs");
(async () => {
// launch a Chromium browser
const browser = await chromium.launch();
const context = await browser.newContext();
// create a new page
const page = await context.newPage();
// navigate to the target web page
await page.goto("https://www.scrapingcourse.com/javascript-rendering", {
waitUntil: "networkidle",
});
// wait for the product grid to load
await page.waitForSelector("#product-grid .product-item", { timeout: 5000 });
// extract product details
const products = await page.$$eval("#product-grid .product-item", (items) => {
return items.map((item) => {
let name = item.querySelector(".product-name").innerText.trim();
let price = item.querySelector(".product-price").innerText.trim();
let imageUrl = item.querySelector(".product-image").src;
return { name, price, imageUrl };
});
});
// specify the CSV headers
const headers = ["name", "price", "imageUrl"];
// add the headers to the CSV file
let csvData = headers.join(",") + "\n";
// create CSV-formatted strings
products.forEach((product) => {
const row = Object.values(product).join(",");
csvData += row + "\n";
});
// write the extracted data to a CSV
fs.writeFile("products.csv", csvData, (err) => {
if (err) {
console.error("Error writing CSV file:", err);
} else {
console.log("CSV file written successfully.");
}
});
// close the browser
await browser.close();
})();
Run the script, and you'll get the following product data in your exported CSV file:
And there you have it, a perfectly scraped web page using Playwright!
Puppeteer
Puppeteer is an automation library for JavaScript (Node.js). Unlike Playwright, it downloads and uses Chromium by default. It focuses more on Chrome DevTools, making it one of the go-to libraries for web scraping.
What Are the Advantages of Puppeteer?
Puppeteer makes it easy to get started with browser automation. It controls Chrome using non-standard DevTools protocol.
What Are the Disadvantages of Puppeteer?
- Puppeteer supports only JavaScript (Node.js).
- Puppeteer currently supports only Chromium (although the development for Firefox support is in progress).
Browser Options in Puppeteer
Most of Playwright's browser options, such as Headless
, SlowMo
, and DevTools
, work in Puppeteer.
await puppeteer.launch({ headless: false, slowMo: 500, devtools: true })
Page Object Methods in Puppeteer
Most Playwright's page object methods work in Puppeteer. Here are some of them.
Object Methods | Meaning |
---|---|
goto() | Visit a page for the first time. |
goForward() | Go forward. |
goBack() | Go back to the previous page. |
reload() | Refresh the page. |
evaluate() | Execute JavaScript code within the web page's context and return the result to your Node.js environment. Interact with the DOM. Alternatively, you can use $eval(), $$eval(), $() and $$(). |
screenshot() | Screenshot a page. |
setDefaultTimeout() or setDefaultNavigationTimeout() | Have the headless browser wait for an action for a specified duration before throwing an error. |
keyboard.press() | Specify the key to press. |
waitForSelector() | Tell the page to delay action until a particular selector has been loaded. |
waitFor() | Delay subsequent actions. |
locator() | The locator class grabs elements using multiple selector combinations. |
click() | Specify the tag whose selector you wish to click. |
select() | Pick an option in the select element. |
Web Scraping With Puppeteer
To scrape a web page using Puppeteer, import the Puppeteer module for web scraping and the fs module for saving the scraped data into a CSV file.
const puppeteer = require("puppeteer");
const fs = require("fs");
Create an asynchronous function to run the headless browser.
const puppeteer = require("puppeteer");
const fs = require("fs");
(async () => {
// write your scraping logic here
})();
Now, launch the headless browser and create a new page.
// ...
(async () => {
// launch a Chromium browser
const browser = await puppeteer.launch();
const page = await browser.newPage();
})();
Using the goto()
method, visit the target page and wait for the product grid to load before scraping the data.
// ...
(async () => {
// ...
// navigate to the target web page
await page.goto("https://www.scrapingcourse.com/javascript-rendering", {
waitUntil: "networkidle0", // Puppeteer uses 'networkidle0' instead of 'networkidle'
});
// wait for the product grid to load
await page.waitForSelector("#product-grid .product-item", { timeout: 5000 });
})();
Extract the product title, price, and image URL before appending the data to the CSV file.
// ...
(async () => {
// ...
// extract product details
const products = await page.$$eval("#product-grid .product-item", (items) => {
return items.map((item) => {
let name = item.querySelector(".product-name").innerText.trim();
let price = item.querySelector(".product-price").innerText.trim();
let imageUrl = item.querySelector(".product-image").src;
return { name, price, imageUrl };
});
});
})();
Finally, export data to a CSV file and close the browser.
// ...
(async () => {
// ...
// specify the CSV headers
const headers = ["name", "price", "imageUrl"];
// add the headers to the CSV file
let csvData = headers.join(",") + "\n";
// create CSV-formatted strings
products.forEach((product) => {
const row = Object.values(product).join(",");
csvData += row + "\n";
});
// write the extracted data to a CSV
fs.writeFile("products.csv", csvData, (err) => {
if (err) {
console.error("Error writing CSV file:", err);
} else {
console.log("CSV file written successfully.");
}
});
// close the browser
await browser.close();
})();
Here's what the complete code looks like:
const puppeteer = require("puppeteer");
const fs = require("fs");
(async () => {
// launch a Chromium browser
const browser = await puppeteer.launch();
const page = await browser.newPage();
// navigate to the target web page
await page.goto("https://www.scrapingcourse.com/javascript-rendering", {
waitUntil: "networkidle0", // Puppeteer uses 'networkidle0' instead of 'networkidle'
});
// wait for the product grid to load
await page.waitForSelector("#product-grid .product-item", { timeout: 5000 });
// extract product details
const products = await page.$$eval("#product-grid .product-item", (items) => {
return items.map((item) => {
let name = item.querySelector(".product-name").innerText.trim();
let price = item.querySelector(".product-price").innerText.trim();
let imageUrl = item.querySelector(".product-image").src;
return { name, price, imageUrl };
});
});
// specify the CSV headers
const headers = ["name", "price", "imageUrl"];
// add the headers to the CSV file
let csvData = headers.join(",") + "\n";
// create CSV-formatted strings
products.forEach((product) => {
const row = Object.values(product).join(",");
csvData += row + "\n";
});
// write the extracted data to a CSV
fs.writeFile("products.csv", csvData, (err) => {
if (err) {
console.error("Error writing CSV file:", err);
} else {
console.log("CSV file written successfully.");
}
});
// close the browser
await browser.close();
})();
Run the script, and you'll get the same product data in your exported CSV file:
Congratulations, you have just scraped a web page using Puppeteer.
Playwright or Puppeteer: Which Is Faster?
Comparing Puppeteer vs. Playwright performance can get tricky, but let's find out which library comes out on top.
Let's create a third script file called performance.js
and run the Playwright's and Puppeteer's code in it. Weโll time how long each function takes to scrape the ScrapingCourse JS Rendering demo page product's data.
const playwrightPerformance = async () => {
// START THE TIMER
console.time('Playwright')
// Playwright scraping code
// END THE TIMER
console.timeEnd('Playwright')
}
const puppeteerPerformance = async () => {
// START THE TIMER
console.time('Puppeteer')
// Puppeteer scraping code
// END THE TIMER
console.timeEnd('Puppeteer')
}
playwrightPerformance()
puppeteerPerformance()
Letโs insert Playwright and Puppeteer scraping code in the respective functions, tune to headless browsing and then run the performance.js file five times to get the average runtime.
Here are the average durations per library:
- Playwright โก๏ธ (7.580 + 7.372 + 6.639 + 7.411 + 7.390) = (36.392 / 5) = 7.2784s
- Puppeteer โก๏ธ (6.656 + 6.653 + 6.856 + 6.592 + 6.839) = (33.596 / 5) = 6.7192s
And voilร , Puppeteer wins the Puppeteer vs. Playwright debate in terms of speed!
It's worth noting that these results are based on our own test. If you feel like running yours, go ahead and use the mini-guide shared above.
Is Playwright Better Than Puppeteer?
Overall, no comparison between Puppeteer and Playwright will give you a definitive answer as to which is the better option. It depends on multiple factors, such as long-term library support, cross-browser support, and your specific need for browser automation.
Here are some of the notable features of Playwright and Puppeteer:
Feature | Playwright | Puppeteer |
---|---|---|
Supported Languages | Python, Java, JavaScript, TypeScript, and C# | JavaScript |
Supported Browsers | Chromium, Firefox, and WebKit | Chromium |
Speed | Fast | Faster |
Conclusion
As you can see, both Playwright and Puppeteer have their advantages, so you should consider the specifics of your scraping project and personal needs before making a decision.
However, the common problem with using both libraries for web scraping is that many websites detect headless browsers as bots, rendering your Playwright- or Puppeteer-based scraper useless.
ZenRows' web scraping API solves this problem perfectly. It can handle all anti-bot and CAPTCHA bypasses for you, and that's just a small portion of what it can do. Take advantage of the free trial and discover how it is to scrape fast and uninterrupted.