The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ๐Ÿ˜Ž

Playwright vs Selenium in 2023: Which is Better?

January 11, 2023 ยท 6 min read

Getting lost while choosing between Playwright vs Selenium for web scraping is not surprising since both are popular open-source automation tools.

It's important to consider your scraping needs and criteria, like compatible languages, documentation and browser support. With that in mind, here are the major differences between Playwright and Selenium:

Playwright Selenium
Compatible Languages Java, Python, .NET, C#, TypeScript, JavaScript Java, Python, C#, Ruby, Perl, PHP, JavaScript, Kotlin
Browser Support Chromium, WebKit, Firefox Chrome, Safari, Firefox, Opera, Edge, IE
Operating System Support Windows, Linux, macOS Windows, Linux, macOS, Solaris
Architecture Headless Browser instance with event-driven architecture JSON Wire protocol on web drivers for automation
Prerequisites API package is enough Selenium Bindings (for the picked programming language) and browser web drivers
Real Device Support Native mobile emulation and experimental real Android support Real device clouds and remote servers
Community A small but growing community Established a collection of documentation along with a huge community

Let's get into the details. We'll discuss their pros and cons, as well as a real example on how to scrape a webpage using Playwright and Selenium.

Playwright

Playwright is an end-to-end web testing and automation library developed by Microsoft. Although the primary role of the framework is to test web applications, it's possible to use it for web scraping purposes.

What Are the Advantages of Playwright?

The advantages of Playwright are:
  • It supports all modern rendering engines including Chromium, WebKit and Firefox.
  • Playwright can be used on Windows, Linux, macOS or CI.
  • It supports TypeScript, JavaScript (NodeJS), Python, .NET and Java.
  • Playwright's execution speed is faster than Selenium's.
  • Playwright supports auto-wait and performs relevant checks for elements.
  • You can generate selectors inspecting web pages and generate a scenario by recording your actions.
  • Playwright supports simultaneous execution and can also block unnecessary resource requests.

What Are the Disadvantages of Playwright?

The advantages of Playwright are:
  • It can handle only emulators and not real devices.
  • Compared to Selenium, Playwright doesn't have a big community.
  • It doesn't work on legacy browsers and devices.
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Web Scraping with Playwright

Let's go for a quick Playwright web scraping tutorial to compare Playwright vs Selenium in terms of their scraping capabilities. We'll extract 250 table items from the first page of Scrape This Site.

Start by importing the required packages and initialize the browser instance:

ScrapeThisSite
Click to open the image in full screen
from bs4 import BeautifulSoup 
from playwright.sync_api import sync_playwright 
 
with sync_playwright() as p: 
	# launch the browser instance and define a new context 
	browser = p.chromium.launch() 
	context = browser.new_context()

Navigate to the target web page using the page.goto() method:

page = context.new_page() 
page.goto("https://www.scrapethissite.com/pages/simple/")

Since every table entry is in a div with the country class, locate the div elements with the CSS class selectors using the page.locator() method. Also, store the number of matched elements to loop through later on:

countries = page.locator("div.country") 
n_countries = countries.count()
ScrapeThisSite countries
Click to open the image in full screen

The next step is to extract the name, capital, population and area using the extract_data() method. Like this:

def extract_data(entry): 
	name = entry.locator("h3").inner_text().strip("\n").strip() 
	capital = entry.locator("span.country-capital").inner_text() 
	population = entry.locator("span.country-population").inner_text() 
	area = entry.locator("span.country-area").inner_text() 
 
	return {"name": name, "capital": capital, "population": population, "area (km sq)": area}

Extract the data using the extract_data function, then close the browser instance:

data = [] 
 
for i in range(n_countries): 
	entry = countries.nth(i) 
	sample = extract_data(entry) 
	data.append(sample) 
 
browser.close()

Congratulations! You have successfully scraped the web page using Playwright. Here's what your output should look like:

[ 
	{'name': 'Andorra', 'capital': 'Andorra la Vella', 'population': '84000', 'area (km sq)': '468.0'}, 
	{'name': 'United Arab Emirates', 'capital': 'Abu Dhabi', 'population': '4975593', 'area (km sq)': '82880.0'}, 
	{'name': 'Afghanistan', 'capital': 'Kabul', 'population': '29121286', 'area (km sq)': '647500.0'}, 
	{'name': 'Antigua and Barbuda', 'capital': "St. John's", 'population': '86754', 'area (km sq)': '443.0'}, 
	{'name': 'Anguilla', 'capital': 'The Valley', 'population': '13254', 'area (km sq)': '102.0'}, 
	... 
]

And if you got lost at any point, this is the full Playwright code:

from playwright.sync_api import sync_playwright 
 
def extract_data(entry): 
	name = entry.locator("h3").inner_text().strip("\n").strip() 
	capital = entry.locator("span.country-capital").inner_text() 
	population = entry.locator("span.country-population").inner_text() 
	area = entry.locator("span.country-area").inner_text() 
 
	return {"name": name, "capital": capital, "population": population, "area (km sq)": area} 
 
with sync_playwright() as p: 
	# launch the browser instance and define a new context 
	browser = p.chromium.launch() 
	context = browser.new_context() 
	# open a new tab and go to the website 
	page = context.new_page() 
	page.goto("https://www.scrapethissite.com/pages/simple/") 
	page.wait_for_load_state("load") 
	# get the countries 
	countries = page.locator("div.country") 
	n_countries = countries.count() 
 
	# loop through the elements and scrape the data 
	data = [] 
 
	for i in range(n_countries): 
		entry = countries.nth(i) 
		sample = extract_data(entry) 
		data.append(sample) 
 
browser.close()

Selenium

Selenium is one of the most popular open-source tools for both web scraping and web automation. You can automate browsers, interact with UI elements and imitate user actions on web applications while scraping with Selenium. Some of Selenium's core components include the WebDriver, Selenium IDE and Selenium Grid.

What Are the Advantages of Selenium?

The advantages of Selenium are:
  • It's easy to use.
  • It can automate a wide number of browsers including IE, mobile browsers and even mobile apps by using Appium.
  • It supports a wide range of programming languages, like Java, C#, Python, Perl, JavaScript and Ruby.
  • It can operate on Windows, macOS and Linux.

What Are the Disadvantages of Selenium?

The disadvantages of Selenium are:
  • Compared to Playwright, Selenium requires a third-party tool to implement parallel execution.
  • There's no built-in reporting support. For example, you need to use an external solution if you need to record a video.
  • It's stressful to scrape data from multiple tabs in Selenium.
  • It doesn't generate an execution report for debugging.

Web Scraping with Selenium

Like we did for Playwright, let's build a simple web scraper using Selenium. To do this, import the necessary modules and configure the Selenium instance. Make sure the headless mode is active by setting option.headless = True.

# to extract the data from the HTML 
from bs4 import BeautifulSoup 
 
# required selenium modules 
from selenium import webdriver 
from selenium.webdriver.chrome.service import Service 
from selenium.webdriver.common.by import By 
# web driver manager: https://github.com/SergeyPirogov/webdriver_manager 
# will help us automatically download the web driver binaries 
# then we can use `Service` to manage the web driver's state. 
from webdriver_manager.chrome import ChromeDriverManager 
 
options = webdriver.ChromeOptions() 
options.headless = True

Install the web drivers with WebDriverManager, then initialize the Chrome service and define the driver instance:

# this returns the path web driver downloaded 
chrome_path = ChromeDriverManager().install() 
chrome_service = Service(chrome_path) 
driver = webdriver.Chrome(service=chrome_service, options=options)

Navigate to the webpage and find the div elements that store the countries:

url = "https://www.scrapethissite.com/pages/simple/" 
driver.get(url) 
 
# get the data divs 
countries = driver.find_elements(By.CSS_SELECTOR, "div.country")

Define a function to extract the data:

def extract_data(row): 
	name = row.find_element(By.TAG_NAME, "h3").text.strip("\n").strip() 
	capital = row.find_element(By.CSS_SELECTOR, "span.country-capital").text 
	population = row.find_element(By.CSS_SELECTOR, "span.country-population").text 
	area = row.find_element(By.CSS_SELECTOR, "span.country-area").text 
 
	return {"name": name, "capital": capital, "population": population, "area (km sq)": area}

Apply map function to extract the values and then quit the web driver instance:

# process the extracted data 
data = list(map(extract_data, countries)) 
 
driver.quit()

Congrats! Here's what your output will look like after running the script:

[ 
	{'name': 'Andorra', 'capital': 'Andorra la Vella', 'population': '84000', 'area (km sq)': '468.0'}, 
	{'name': 'United Arab Emirates', 'capital': 'Abu Dhabi', 'population': '4975593', 'area (km sq)': '82880.0'}, 
	{'name': 'Afghanistan', 'capital': 'Kabul', 'population': '29121286', 'area (km sq)': '647500.0'}, 
	{'name': 'Antigua and Barbuda', 'capital': "St. John's", 'population': '86754', 'area (km sq)': '443.0'}, 
	{'name': 'Anguilla', 'capital': 'The Valley', 'population': '13254', 'area (km sq)': '102.0'}, 
	... 
]

Here's what the full code looks like:

from selenium import webdriver 
from selenium.webdriver.chrome.service import Service 
from selenium.webdriver.common.by import By 
# web driver manager: https://github.com/SergeyPirogov/webdriver_manager 
# will help us automatically download the web driver binaries 
# then we can use `Service` to manage the web driver's state. 
from webdriver_manager.chrome import ChromeDriverManager 
 
def extract_data(row): 
	name = row.find_element(By.TAG_NAME, "h3").text.strip("\n").strip() 
	capital = row.find_element(By.CSS_SELECTOR, "span.country-capital").text 
	population = row.find_element(By.CSS_SELECTOR, "span.country-population").text 
	area = row.find_element(By.CSS_SELECTOR, "span.country-area").text 
 
	return {"name": name, "capital": capital, "population": population, "area (km sq)": area} 
 
options = webdriver.ChromeOptions() 
options.headless = True 
# this returns the path web driver downloaded 
chrome_path = ChromeDriverManager().install() 
# define the chrome service and pass it to the driver instance 
chrome_service = Service(chrome_path) 
driver = webdriver.Chrome(service=chrome_service, options=options) 
 
url = "https://www.scrapethissite.com/pages/simple" 
 
driver.get(url) 
# get the data divs 
countries = driver.find_elements(By.CSS_SELECTOR, "div.country") 
 
# extract the data 
data = list(map(extract_data, countries)) 
 
driver.quit()

Which Is Faster: Playwright or Selenium?

If we're talking speed comparison between Selenium vs Playwright, there's only one answer to it: Playwright is faster than Selenium. But how much?

To compare the speed between Selenium and Playwright, we used the time module and slightly adjusted the scripts to include the timing calculations. We added start_time = time.time() and end_time = time.time() to the top and bottom of the script and then calculated the difference with end_time - start_time.

Here's the script for the Playwright:

import time 
from playwright.sync_api import sync_playwright 
 
def extract_data(entry): 
	name = entry.locator("h3").inner_text().strip("\n").strip() 
	capital = entry.locator("span.country-capital").inner_text() 
	population = entry.locator("span.country-population").inner_text() 
	area = entry.locator("span.country-area").inner_text() 
 
	return {"name": name, "capital": capital, "population": population, "area (km sq)": area} 
 
start = time.time() 
with sync_playwright() as p: 
	# launch the browser instance and define a new context 
	browser = p.chromium.launch() 
	context = browser.new_context() 
	# open a new tab and go to the website 
	page = context.new_page() 
	page.goto("https://www.scrapethissite.com/pages/") 
	# click to the first page and wait while page loads 
	page.locator("a[href='/pages/simple/']").click() 
	page.wait_for_load_state("load") 
	# get the countries 
	countries = page.locator("div.country") 
	n_countries = countries.count() 
 
	data = [] 
 
	for i in range(n_countries): 
		entry = countries.nth(i) 
		sample = extract_data(entry) 
		data.append(sample) 
 
browser.close() 
end = time.time() 
 
print(f"The whole script took: {end-start:.4f}")

And here's the script used for Selenium:

import time 
from selenium import webdriver 
from selenium.webdriver.chrome.service import Service 
from selenium.webdriver.common.by import By 
# web driver manager: https://github.com/SergeyPirogov/webdriver_manager 
# will help us automatically download the web driver binaries 
# then we can use `Service` to manage the web driver's state. 
from webdriver_manager.chrome import ChromeDriverManager 
 
def extract_data(row): 
	name = row.find_element(By.TAG_NAME, "h3").text.strip("\n").strip() 
	capital = row.find_element(By.CSS_SELECTOR, "span.country-capital").text 
	population = row.find_element(By.CSS_SELECTOR, "span.country-population").text 
	area = row.find_element(By.CSS_SELECTOR, "span.country-area").text 
 
	return {"name": name, "capital": capital, "population": population, "area (km sq)": area} 
 
# start the timer 
start = time.time() 
 
options = webdriver.ChromeOptions() 
options.headless = True 
# this returns the path web driver downloaded 
chrome_path = ChromeDriverManager().install() 
# define the chrome service and pass it to the driver instance 
chrome_service = Service(chrome_path) 
driver = webdriver.Chrome(service=chrome_service, options=options) 
 
url = "https://www.scrapethissite.com/pages/" 
 
driver.get(url) 
# get the first page and click to the link 
first_page = driver.find_element(By.CSS_SELECTOR, "h3.page-title a") 
first_page.click() 
# get the data div and extract the data using beautifulsoup 
countries_container = driver.find_element(By.CSS_SELECTOR, "section#countries div.container") 
countries = driver.find_elements(By.CSS_SELECTOR, "div.country") 
 
# scrape the data using extract_data function 
data = list(map(extract_data, countries)) 
 
end = time.time() 
 
print(f"The whole script took: {end-start:.4f}") 
 
driver.quit()

We'll add these scripts to their respective scrapers and here's our result after running the codes:

Comparison results
Click to open the image in full screen

And there you have it! The results we got from the speed test between Playwright vs Selenium showed that Playwright is around 5 times faster than Selenium.

Selenium vs Playwright: Which is better?

Playwright and Selenium are both fantastic automation tools capable of seamlessly scraping a web page when done right. However, there can be some headaches while trying to pick your guy, so the best option falls on your web scraping needs, the type of data you want to scrape, browser support and other considerations.

As a recap, here are some of the major differences between Selenium vs Playwright
  • Playwright doesn't support real devices while Selenium can be used in real devices and remote servers.
  • Playwright has built-in parallelization support whereas Selenium requires a third-party tool.
  • Playwright executes faster than Selenium.
  • Selenium doesn't support features such as detailed reporting and video recording while Playwright has built-in support.
  • Selenium supports more browsers than Playwright.
  • Selenium supports more programming languages.

Scalability is one of the main headaches around using web scrapers built on frameworks like Playwright or Selenium as they can trigger antibot securities and get blocked. One of the best ways to avoid this is by making use of a web scraping API, like ZenRows, that is capable of avoiding antibots whilst crawling a web page.

ZenRows does this by handling all anti-bot and CAPTCHA bypass with a single API call, and that's just a small portion of what it's capable of. Test it out yourself for free to see how easy it is to get the data you care about.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.