The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ๐Ÿ˜Ž

Playwright vs. Selenium in 2024: Which Is Better?

October 14, 2023 ยท 6 min read

Getting lost in choosing Playwright vs. Selenium for web scraping isn't surprising since both are popular open-source automation tools.

It's essential to consider your scraping needs and criteria, like compatible languages, documentation, and browser support. With that in mind, here are the most significant differences between Playwright and Selenium:

Playwright Selenium
Compatible Languages Java, Python, .NET, C#, TypeScript, JavaScript Java, Python, C#, Ruby, Perl, PHP, JavaScript, Kotlin
Browser Support Chromium, WebKit, Firefox Chrome, Safari, Firefox, Opera, Edge, IE
Operating System Support Windows, Linux, macOS Windows, Linux, macOS, Solaris
Architecture Headless Browser instance with event-driven architecture JSON Wire protocol on web drivers for automation
Prerequisites API package is enough Selenium Bindings (for the picked programming language) and browser web drivers
Real Device Support Native mobile emulation and experimental real Android support Real device clouds and remote servers
Community A small but growing community Established a collection of documentation along with a huge community

Let's dive deeper. We'll discuss the pros and cons and real-life examples of scraping pages using Playwright and Selenium.

Playwright

Playwright is an end-to-end web testing and automation library developed by Microsoft. Although the framework's primary role is to test web applications, it also fits web scraping purposes.

What Are the Advantages of Playwright?

Let's explore how using Playwright can be beneficial for your scraping needs:

  • It supports all modern rendering engines, including Chromium, WebKit, and Firefox.
  • Playwright can be used on Windows, Linux, macOS, or CI.
  • It supports TypeScript, JavaScript (Node.js), Python, .NET, and Java.
  • Playwright's execution speed is faster than Selenium's.
  • The framework also supports auto-wait and performs relevant checks for elements.
  • You can generate selectors inspecting web pages and a scenario by recording your actions.
  • Playwright supports simultaneous execution and can also block unnecessary resource requests.

What Are the Disadvantages of Playwright?

Here are its shortcomings:

  • It can handle only emulators and not real devices.
  • Compared to Selenium, Playwright doesn't have a big community.
  • It doesn't work on legacy browsers and devices.
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Web Scraping with Playwright

Let's go for a quick Playwright tutorial to help us compare Playwright vs. Selenium in terms of scraping capabilities. We'll extract 250 table items from the first page of Scrape This Site.

Start by importing the required packages and initialize the browser instance:

playwright vs selenium test
Click to open the image in full screen
scraper.py
from playwright.sync_api import sync_playwright 
 
with sync_playwright() as p: 
	# launch the browser instance and define a new context 
	browser = p.chromium.launch() 
	context = browser.new_context()

Navigate to the target page using the page.goto() method:

scraper.py
page = context.new_page() 
page.goto("https://www.scrapethissite.com/pages/simple/")

Since every table entry is in a div with the country class, locate the div elements with the CSS class selectors using the page.locator() method. Also, store the number of matched elements to loop through later on:

scraper.py
countries = page.locator("div.country") 
n_countries = countries.count()
playwright vs selenium results
Click to open the image in full screen

Next, extract the name, capital, population, and area using the extract_data() method. Like this:

scraper.py
def extract_data(entry): 
	name = entry.locator("h3").inner_text().strip("\n").strip() 
	capital = entry.locator("span.country-capital").inner_text() 
	population = entry.locator("span.country-population").inner_text() 
	area = entry.locator("span.country-area").inner_text() 
 
	return {"name": name, "capital": capital, "population": population, "area (km sq)": area}

Retrieve your information using the extract_data function and close the browser instance:

scraper.py
with sync_playwright() as p: 
	#...
	data = [] 
 
	for i in range(n_countries): 
		entry = countries.nth(i) 
		sample = extract_data(entry) 
		data.append(sample)
		
	print(data)		
		
	browser.close()

Congratulations! You've successfully scraped the page using Playwright. Here's what your output should look like:

Output
[ 
	{'name': 'Andorra', 'capital': 'Andorra la Vella', 'population': '84000', 'area (km sq)': '468.0'}, 
	{'name': 'United Arab Emirates', 'capital': 'Abu Dhabi', 'population': '4975593', 'area (km sq)': '82880.0'}, 
	{'name': 'Afghanistan', 'capital': 'Kabul', 'population': '29121286', 'area (km sq)': '647500.0'}, 
	{'name': 'Antigua and Barbuda', 'capital': "St. John's", 'population': '86754', 'area (km sq)': '443.0'}, 
	{'name': 'Anguilla', 'capital': 'The Valley', 'population': '13254', 'area (km sq)': '102.0'}, 
	... 
]

If you got lost at any point, this is the full Playwright code:

scraper.py
from playwright.sync_api import sync_playwright 
 
def extract_data(entry): 
	name = entry.locator("h3").inner_text().strip("\n").strip() 
	capital = entry.locator("span.country-capital").inner_text() 
	population = entry.locator("span.country-population").inner_text() 
	area = entry.locator("span.country-area").inner_text() 
 
	return {"name": name, "capital": capital, "population": population, "area (km sq)": area} 
 
with sync_playwright() as p: 
	# launch the browser instance and define a new context 
	browser = p.chromium.launch() 
	context = browser.new_context() 
	# open a new tab and go to the website 
	page = context.new_page() 
	page.goto("https://www.scrapethissite.com/pages/simple/") 
	# get the countries 
	countries = page.locator("div.country") 
	n_countries = countries.count() 
 
	# loop through the elements and scrape the data 
	data = [] 
 
	for i in range(n_countries): 
		entry = countries.nth(i) 
		sample = extract_data(entry) 
		data.append(sample)
		
	print(data)		
		
	browser.close()

Selenium

Selenium is among the most popular open-source tools for web scraping and automation. You can automate browsers, interact with UI elements, and imitate user actions on web applications while scraping with Selenium. Some of Selenium's core components include the WebDriver, Selenium IDE, and Selenium Grid.

What Are the Advantages of Selenium?

See below the most valuable strengths of the framework:

  • It's easy to use.
  • It can automate a wide number of browsers, including IE, mobile browsers and even mobile apps by using Appium.
  • It supports a wide range of programming languages, like Java, C#, Python, Perl, JavaScript, and Ruby.
  • It can operate on Windows, macOS, and Linux.

What Are the Disadvantages of Selenium?

Here's what it lacks:

  • Compared to Playwright, Selenium requires a third-party tool to implement parallel execution.
  • There's no built-in reporting support, e.g., you need to use an external solution if you need to record a video.
  • It's stressful to scrape data from multiple tabs in Selenium.
  • It doesn't generate an execution report for debugging.

Web Scraping with Selenium

Let's build a web scraper using Selenium!ย 

First, import the necessary modules and configure the Selenium instance. Make sure the headless mode is active by setting options.add_argument("--headless").

scraper.py
# required selenium modules 
from selenium import webdriver 
from selenium.webdriver.common.by import By 

# create ChromeOptions object
options = webdriver.ChromeOptions()
options.add_argument("--headless")

Initialize the Chrome WebDriver instance:

scraper.py
# create a new Chrome webdriver instance, passing in the options object
driver = webdriver.Chrome(options=options)

Navigate to the page and find the div elements that store the countries:

scraper.py
url = "https://www.scrapethissite.com/pages/simple/" 
driver.get(url) 
 
# get the data divs 
countries = driver.find_elements(By.CSS_SELECTOR, "div.country")

Define a function to extract the data:

scraper.py
def extract_data(row): 
	name = row.find_element(By.TAG_NAME, "h3").text.strip("\n").strip() 
	capital = row.find_element(By.CSS_SELECTOR, "span.country-capital").text 
	population = row.find_element(By.CSS_SELECTOR, "span.country-population").text 
	area = row.find_element(By.CSS_SELECTOR, "span.country-area").text 
 
	return {"name": name, "capital": capital, "population": population, "area (km sq)": area}

Apply map function to extract the values and then quit the web driver instance:

scraper.py
# process the extracted data 
data = list(map(extract_data, countries)) 
print(data)

driver.quit()

Congrats, you did it! Here's what your output will look like after running the script:

Output
[ 
	{'name': 'Andorra', 'capital': 'Andorra la Vella', 'population': '84000', 'area (km sq)': '468.0'}, 
	{'name': 'United Arab Emirates', 'capital': 'Abu Dhabi', 'population': '4975593', 'area (km sq)': '82880.0'}, 
	{'name': 'Afghanistan', 'capital': 'Kabul', 'population': '29121286', 'area (km sq)': '647500.0'}, 
	{'name': 'Antigua and Barbuda', 'capital': "St. John's", 'population': '86754', 'area (km sq)': '443.0'}, 
	{'name': 'Anguilla', 'capital': 'The Valley', 'population': '13254', 'area (km sq)': '102.0'}, 
	... 
]

And here's the full code:

scraper.py
# required selenium modules 
from selenium import webdriver 
from selenium.webdriver.common.by import By 

def extract_data(row): 
	name = row.find_element(By.TAG_NAME, "h3").text.strip("\n").strip() 
	capital = row.find_element(By.CSS_SELECTOR, "span.country-capital").text 
	population = row.find_element(By.CSS_SELECTOR, "span.country-population").text 
	area = row.find_element(By.CSS_SELECTOR, "span.country-area").text 
 
	return {"name": name, "capital": capital, "population": population, "area (km sq)": area}

# create ChromeOptions object
options = webdriver.ChromeOptions()
options.add_argument("--headless")

# create a new Chrome webdriver instance, passing in the options object
driver = webdriver.Chrome(options=options)

url = "https://www.scrapethissite.com/pages/simple/" 
driver.get(url) 
 
# get the data divs 
countries = driver.find_elements(By.CSS_SELECTOR, "div.country")

# process the extracted data 
data = list(map(extract_data, countries)) 
print(data)

driver.quit()

Which Is Faster: Playwright or Selenium?

If we're talking speed comparison in the Selenium vs. Playwright battle, there's only one answer: the latter wins by a lot. But how much exactly? Let's check:

To compare speeds, we'll use the time module and slightly adjust the scripts to include the timing calculations. Add start_time = time.time() and end_time = time.time() to the top and bottom of the script and then calculate the difference with end_time - start_time.

Here's the Playwright script:

scraper.py
import time
from playwright.sync_api import sync_playwright

def extract_data(entry):
    name = entry.locator("h3").inner_text().strip("\n").strip()
    capital = entry.locator("span.country-capital").inner_text()
    population = entry.locator("span.country-population").inner_text()
    area = entry.locator("span.country-area").inner_text()

    return {"name": name, "capital": capital, "population": population, "area (km sq)": area}

start = time.time()
with sync_playwright() as p:
    # launch the browser instance and define a new context
    browser = p.chromium.launch()
    context = browser.new_context()
    # open a new tab and go to the website
    page = context.new_page()
    page.goto("https://www.scrapethissite.com/pages/")
    # click to the first page and wait while the page loads
    page.locator("a[href='/pages/simple/']").click()
    page.wait_for_load_state("load")
    # get the countries
    countries = page.locator("div.country")
    n_countries = countries.count()

    data = []

    for i in range(n_countries):
        entry = countries.nth(i)
        sample = extract_data(entry)
        data.append(sample)

    browser.close()

end = time.time()

print(f"The whole script took: {end - start:.4f}")

And here's one used for Selenium:

scraper.py
import time 
from selenium import webdriver 
from selenium.webdriver.common.by import By 
 
def extract_data(row): 
	name = row.find_element(By.TAG_NAME, "h3").text.strip("\n").strip() 
	capital = row.find_element(By.CSS_SELECTOR, "span.country-capital").text 
	population = row.find_element(By.CSS_SELECTOR, "span.country-population").text 
	area = row.find_element(By.CSS_SELECTOR, "span.country-area").text 
 
	return {"name": name, "capital": capital, "population": population, "area (km sq)": area} 
 
# start the timer 
start = time.time() 
 
options = webdriver.ChromeOptions() 
options.add_argument("--headless")

# create a new Chrome webdriver instance, passing in the options object
driver = webdriver.Chrome(options=options) 
 
url = "https://www.scrapethissite.com/pages/" 
 
driver.get(url) 
# get the first page and click to the link 
first_page = driver.find_element(By.CSS_SELECTOR, "h3.page-title a") 
first_page.click() 
# get the data div and extract the data using beautifulsoup 
countries_container = driver.find_element(By.CSS_SELECTOR, "section#countries div.container") 
countries = driver.find_elements(By.CSS_SELECTOR, "div.country") 
 
# scrape the data using extract_data function 
data = list(map(extract_data, countries)) 
 
end = time.time() 
 
print(f"The whole script took: {end-start:.4f}") 
 
driver.quit()

Add these to their respective scrapers and see the results after running the codes:

selenium vs playwright speed
Click to open the image in full screen

And there you have it! The results from our Playwright vs. Selenium speed test are clear: the former is around five times faster than the latter.

Selenium vs. Playwright: Which Is Better?

Playwright and Selenium are both fantastic automation tools capable of seamlessly scraping a web page when done right. However, there can be some headaches while picking, so the best option falls on your specific needs, the type of target data, browser support, and other considerations.

Here's a recap of the primary differences between Selenium vs. Playwright:

  • Selenium can be used in real devices and remote servers, while Playwright doesn't offer this option.
  • Playwright has built-in parallelization support, whereas Selenium requires a third-party tool.
  • Playwright executes faster than Selenium.
  • Selenium doesn't support features like detailed reporting and video recording, while Playwright provides built-in support.
  • Selenium supports more browsers and programming languages than its opponent.

Scalability is among the major challenges that come with using web scrapers built on frameworks like Playwright or Selenium, as those can trigger anti-bot securities and get blocked.

How best to avoid this? Use a web scraping API like ZenRow to easily bypass all challenges while crawling and extracting your target data.

ZenRows does this by handling anti-scraping protection with a single API call, and that's just a small portion of what it's capable of. Test it for free and see how easy it's to get the data you need.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.