The Anti-bot Solution to Scrape Everything? Get Your Free API Key! 😎

Playwright vs Puppeteer in 2024: Which Should You Choose

December 2, 2022 · 9 min read

The Playwright vs. Puppeteer debate is a big discussion since both are fantastic Node.js libraries for browser automation. Although they do pretty much the same thing, Puppeteer and Playwright have some notable differences.

Let's run through a quick history here:

The Chrome Dev team created Puppeteer in 2017 to cover Selenium's unreliability for browser automation.  

Microsoft later launched Playwright, and, similarly to Puppeteer, it's capable of running complex automation tests on a browser efficiently. Yet this time around, they introduced more tools into the testing environment.

So which one is the best?

Let's look at Puppeteer and Playwright's differences to see what makes each library unique.

Playwright vs. Puppeteer: What Are the Major Differences?

Puppeteer and Playwright are headless browsers originally designed for end-to-end automated testing of web apps. They're used for other purposes as well, such as web scraping. 

Although they have similar use cases, some key differences between the automation tools are:

  • Playwright supports Python, Golang, Java, JavaScript and C#, while Puppeteer supports only JavaScript and a non-official port for Python.
  • Playwright supports three browsers: Chromium, Firefox and WebKit, but Puppeteer supports only Chromium.

Playwright

Playwright is an end-to-end web testing and automation library. Although the primary role of the framework is to test web applications, it's possible to use it for web scraping purposes.

What Are the Advantages of Playwright?

  • Through a single API, the library lets you use Chromium, Firefox or WebKit for testing. Besides that, the cross-platform framework runs fast in Windows, Linux and MacOS.
  • Playwright supports Python, Golang, Java, JavaScript and C#.
  • Playwright runs faster than most testing frameworks like Cypress.

What Are the Disadvantages of Playwright?

  • Playwright lacks support for Ruby and Java.
  • Instead of real devices, Playwright uses desktop browsers to emulate mobile devices.

Playwright Browser Options

Browser options and page methods control the testing environment.

  • Headless: This determines whether you see the browser during the testing. By default, the value is set to false. You can change it to true to see the browser during testing.
  • SlowMo: The slow movement reduces the speed of switching between actions on the page. For example, a 500 value denotes delaying the action by 500 milliseconds.
  • DevTools: You can open Chrome Dev Tools upon launching the target page. Note this option works for Chromium only.
Terminal
await playwright.chromium.launch({ devtools: true })

Playwright Page Object Methods

Here are some methods to control the launch page.

Object Methods Meaning
goto() Visit a page for the first time
reload() This method refreshes the page
evaluate() This method gives you a mini API to grab an element and manipulate it with JavaScript for the DOM in a Node.js environment. Alternatively, you can use $eval(), $$eval(), $() and $$().
screenshot() To screenshot a page.
setDefaultTimeout() It lets the headless browser wait for an action for a specified duration before throwing an error.
keyboard.press() This method lets you specify the key to press.
waitForSelector() It tells the page to delay action until a particular selector has been loaded.
locator() The locator class grabs elements using multiple selector combinations.
click() This method lets you specify the tag whose selector you wish to click.

Web Scraping with Playwright

As a quick tutorial to back up the Playwright vs. Puppeteer debate, let's use Playwright to scrape the product titles, prices and image URLs from the Vue Storefront and save the results in a CSV file.

Start by importing the Playwright and filesystem (fs) modules to save the scraped data in a CSV file.

scraper.js
import playwright from 'playwright' // web scraping 
import fs from 'fs' // saving data to CSV

Remember to specify the module type in the package.json file; otherwise, the import syntax won't work.

Module Type
Click to open the image in full screen

Since Playwright runs in an asynchronous environment and the async-await syntax only runs in an asynchronous function, you can create an asynchronous main function and code the scraper in it.

scraper.js
const main = async () => { 
	// write some code 
} 
main()

The next step is to launch a browser and create a new page, so let's go ahead and launch Chromium in a headed mode.

scraper.js
const browser = await playwright.chromium.launch({ headless: false })

You've opened the browser; congratulations, we're halfway there. Create a page object using the browser API's newPage() method.

scraper.js
const page = await browser.newPage()

To scrape Vue Storefront's product details, visit the "Kitchen" category page and sort the items by "Newest".

scraper.js
await page.goto('https://demo.vuestorefront.io/c/kitchen?sort=NEWEST')

Alternatively, you can automate the scraper to locate and click each element at a time till you get to the target page.

Inspect Menu
Click to open the image in full screen

Let's create a CSV file and write its headings to scrap the title, price and image URLs.

scraper.js
fs.writeFileSync('products.csv', 'title,price,imageUrl\n')

You can locate the div element containing products using the CSS class selector and store the resulting array of elements in the products variable.

scraper.js
 const products = await page.$$('.products__grid > .sf-product-card.products__product-card')

Using the for-of loop, extract the title, price, and image URL from each (product) child element as shown below:

scraper.js
for (const product of products) { 
	let title, price, imageUrl 
	// extracting the target portions into title, price and image urls, respectively 
	title = await page.evaluate(e => e.querySelector('.sf-product-card__title').textContent.trim(), product) 
	price = await page.evaluate(e => e.querySelector('.sf-price__regular').textContent.trim(), product) 
	imageUrl = await page.evaluate(e => e.querySelector('.sf-image.sf-image-loaded').src, product) 
	// for every loop, append the extracted data into the CSV file 
	fs.appendFile('products.csv', `${title},${price},${imageUrl}\n`, e => { if (e) console.log(e) }) 
}

Close the browser and run the script file.

scraper.js
await browser.close()

And boom! There you have it, a perfectly scraped webpage using Playwright.

Playwright Output
Click to open the image in full screen

In case you got lost along the line, here's what the complete code looks like.

index.js
// in index.js 
 
// Import the modules: playwright (web scraping) and fs (saving data to CSV) 
import playwright from 'playwright' 
import fs from 'fs' 
 
// create asynchronous main function 
const main = async () => { 
	// launch a visible chromium browser 
	const browser = await playwright.chromium.launch({ headless: false }) 
 
	// create a new page object 
	const page = await browser.newPage() 
	// visit the target page 
	await page.goto('https://demo.vuestorefront.io/c/kitchen?sort=NEWEST') 
	// create a CSV file, in readiness to save the data we are about to scrape 
	fs.writeFileSync('products.csv', 'title,price,imageUrl\n') 
 
	// download an array of divs containing the target data 
	const products = await page.$$('.products__grid > .sf-product-card.products__product-card') 
	// loop through the array, 
	for (const product of products) { 
		let title, price, imageUrl 
		// dissecting the target portions into title, price and image urls, respectively 
		title = await page.evaluate(e => e.querySelector('.sf-product-card__title').textContent.trim(), product) 
		price = await page.evaluate(e => e.querySelector('.sf-price__regular').textContent.trim(), product) 
		imageUrl = await page.evaluate(e => e.querySelector('.sf-image.sf-image-loaded').src, product) 
		// for every loop, append the dissected data into the already created CSV file 
		fs.appendFile('products.csv', `${title},${price},${imageUrl}\n`, e => { if (e) console.log(e) }) 
	} 
	// Close the (running headless) browser when the mission is accomplished 
	await browser.close() 
} 
 
// don't forget to run the main() function 
main()

Puppeteer

Puppeteer is an automation library for JavaScript (Node.js), and unlike Playwright, it downloads and uses Chromium by default. It focuses more on the Chrome DevTools, making it one of the go-to libraries for web scraping.

What Are the Advantages of Puppeteer?

Puppeteer simplifies getting started with browser automation. It controls Chrome using non-standard DevTools protocol.

What Are the Disadvantages of Puppeteer?

  • Puppeteer supports only JavaScript (Node.js).
  • Although the development for Firefox support is in progress, Puppeteer currently supports only Chromium.

Browser Options in Puppeteer

Most Playwright's browser options like Headless, SlowMo and DevTools work in Puppeteer.

index.js
await puppeteer.launch({ headless: false, slowMo: 500, devtools: true })
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Page Object Methods in Puppeteer

Similarly, most Playwright's page object methods work in Puppeteer. Here are some of them.

Object Methods Meaning
goto() Visit a page for the first time
goForward() To go forward
goBack() To go back to the previous page
reload() This method refreshes the page
evaluate() This method gives you a mini API to grab an element and manipulate it with JavaScript for the DOM in a Node.js environment. Alternatively, you can use $eval(), $$eval(), $() and $$().
screenshot() To screenshot a page.
setDefaultTimeout() or setDefaultNavigationTimeout() It lets the headless browser wait for an action for a specified duration before throwing an error.
keyboard.press() This method lets you specify the key to press.
waitForSelector() It tells the page to delay action until a particular selector has been loaded.
waitFor() To delay subsequent actions.
locator() The locator class grabs elements using multiple selector combinations.
click() This method lets you specify the tag whose selector you wish to click.
select() To pick an option in the select element.

Web Scraping with Puppeteer

To scrape a webpage using Puppeteer, import the Puppeteer module for web scraping and the fs module for saving the scraped data into a CSV file.

scraper.js
import puppeteer from 'puppeteer' // web scraping 
import fs from 'fs' // saving scraped data 

Create an asynchronous function to run the headless browser.

scraper.js
const main = async () => { 
	// write some code 
} 
main()

Now launch the headless browser and create a new page.

scraper.js
const browser = await puppeteer.launch({ headless: false }) 
const page = await browser.newPage()

Using the goto() method, visit the target page before scraping the data.

scraper.js
await page.goto('https://demo.vuestorefront.io/c/kitchen?sort=NEWEST')

Next, create a CSV file to store the scraped data.

scraper.js
fs.writeFileSync('products.csv', 'title,price,imageUrl\n')

Locate and crawl the data from the web page.

scraper.js
const products = await page.$$('.products__grid > .sf-product-card.products__product-card')

Using the for-of loop, extract the product title, price and image URL before appending the data to the CSV file.

scraper.js
for (const product of products) { 
	let title, price, imageUrl 
	// extracting the target portions into title, price and image urls, respectively 
	title = await page.evaluate( e => e.querySelector('.sf-product-card__title').textContent.trim(), product) 
	price = await page.evaluate( e => e.querySelector('.sf-price__regular').textContent.trim(), product) 
	imageUrl = await page.evaluate( e => e.querySelector('.sf-image.sf-image-loaded').src, product) 
	// for every loop, append the extracted data into the CSV file 
	fs.appendFile('products.csv', `${title},${price},${imageUrl}\n`, e => { if (e) console.log(e) }) 
}

Lastly, close the browser and run the script.

scraper.js
await browser.close()

Congratulations, you have just scraped a web page using Puppeteer. 😀

Puppeteer Web Scraping
Click to open the image in full screen

Here's what the full code looks like:

scraper.js
// Import the modules: puppeteer (web scraping) and fs (saving data to CSV) 
import puppeteer from 'puppeteer' 
import fs from 'fs' 
 
// create asynchronous main function 
const main = async () => { 
	// launch a headed chromium browser 
	const browser = await puppeteer.launch({ headless: false }) 
 
	// create a new page object 
	const page = await browser.newPage() 
	// visit the target page 
	await page.goto('https://demo.vuestorefront.io/c/kitchen?sort=NEWEST') 
	// create a CSV file, in readiness to save the data we are about to scrape 
	fs.writeFileSync('products.csv', 'title,price,imageUrl\n') 
 
	// download an array of divs containing the target data 
	const products = await page.$$('.products__grid > .sf-product-card.products__product-card') 
	// loop through the array, 
	for (const product of products) { 
		let title, price, imageUrl 
		// dissecting the target portions into title, price and image urls, respectively 
		title = await page.evaluate( e => e.querySelector('.sf-product-card__title').textContent.trim(), product) 
		price = await page.evaluate( e => e.querySelector('.sf-price__regular').textContent.trim(), product) 
		imageUrl = await page.evaluate( e => e.querySelector('.sf-image.sf-image-loaded').src, product) 
		// for every loop, append the dissected data into the already created CSV file 
		fs.appendFile('products.csv', `${title},${price},${imageUrl}\n`, e => { if (e) console.log(e) }) 
	} 
	// Close the (running headless) browser when the mission is accomplished 
	await browser.close() 
} 
 
// don't forget to run the main() function 
main()

Playwright or Puppeteer: Which Is Faster?

Comparing Puppeteer vs. Playwright performance can get tricky, but let's find out which library comes out on top.

Let's create a third script file called performance.js and run the Playwright's and Puppeteer's code in it while timing how long each function takes to scrape the Vue Storefront's data.

scraper.js
// in performance.js 
 
const playwrightPerformance = async () => { 
	// START THE TIMER 
	console.time('Playwright') 
	// Playwright scraping code 
	// END THE TIMER 
	console.timeEnd('Playwright') 
} 
 
const puppeteerPerformance = async () => { 
	// START THE TIMER 
	console.time('Puppeteer') 
	// Puppeteer scraping code 
	// END THE TIMER 
	console.timeEnd('Puppeteer') 
} 
 
playwrightPerformance() 
puppeteerPerformance()

We'll insert Playwright and Puppeteer scraping code in the respective functions, tune to headless browsing and then run the performance.js file five times to get the average runtime.

Here are the average durations per library:

  • Playwright ➡️ (7.580 + 7.372 + 6.639 + 7.411 + 7.390) = (36.392 / 5) = 7.2784s
  • Puppeteer ➡️ (6.656 + 6.653 + 6.856 + 6.592 + 6.839) = (33.596 / 5) = 6.7192s
Playwright vs. Puppeteer Performance
Click to open the image in full screen

And voilà, Puppeteer wins the Puppeteer vs. Playwright debate in terms of speed!

It's worth noting that these results are based on our own test. If you feel like running yours, go ahead and use the mini-guide shared above.

Is Playwright Better Than Puppeteer?

Overall, no Puppeteer vs. Playwright comparison will give you a direct answer as to which is the better option. It depends on multiple factors like long-term library support, cross-browser support and your specific need for browser automation.

Here are some of the notable features of Playwright and Puppeteer:

Feature Playwright Puppeteer
Supported Languages Python, Java, JavaScript, and C# JavaScript
Supported Browsers Chromium, Firefox, and WebKit Chromium
Speed Fast Faster

Conclusion

As you can see both Playwright and Puppeteer have their advantages, so you should consider the specifics of your scraping project and personal needs before picking one of the libraries.

However, a common problem web scraping faces is that some websites detect bots and block headless browsing, especially if you click on buttons and send multiple traffic quickly. A good solution is to introduce timers before subsequent actions.

For example, you can program Puppeteer to mimic a (human) user by waiting for 0.1s before clicking a button after typing details into a login form. Yet the downside of multiple timers is that they slow your browsing, and most websites can detect them.

ZenRows API solves this problem perfectly, it's capable of handling all anti-bot and CAPTCHA bypass for you, and that's just a small portion of what it's capable of. Take advantage of the free trial available to find out why it's a holy grail for web scraping.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.