How to Solve Playwright 403 Forbidden Error

April 2, 2024 · 8 min read

Table of contents

What is Playwright 403?
How to fix it?
- Avoid IP bans with proxies
- Customize User Agent
- Optimize request frequency
- Use a web scraping API
- Use Playwright Stealth
Conclusion

What Is Playwright 403?

The Playwright 403 error is related to the HTTP Status Code 403: Forbidden, which is common in web scraping. It implies that the server understood the request but refused to authorize it.

Beyond scraping, Playwright is widely used for automated testing, web application automation, and browser UI automation. When a Playwright script encounters a 403 error, it typically receives a response like the one below.

                    Example
                
HTTPError: 403 Client Error: Forbidden for url: https://www.g2.com/products/asana/reviews

Copied!

For actionable steps on how to overcome this error, read on.

How to Fix 403 Forbidden in Playwright

Below are five proven techniques that can help you overcome the Playwright 403 error.

Avoid IP bans with proxies.
Customize your user agent.
Optimize your request frequency.
Leverage a web scraping API to never get blocked.
Use Playwright Stealth extension.

Note

While the Playwright 403 error can occur in any use case, these techniques will focus on fixing the error when web scraping.

Frustrated that your web scrapers are blocked once and again?

ZenRows API handles rotating proxies and headless browsers for you.

Try for FREE

1. Avoid IP Bans With Proxies

Proxies are servers that act as a bridge between your web scraper and the target website. When web scraping, a Playwright proxy enables you to route your requests through a different IP address, allowing you to retrieve data anonymously. This can also help avoid rate limiting, one cause of the Playwright 403 error.

To implement a proxy with Playwright, launch the browser, passing your proxy credentials as separate parameters.

Below is a basic example demonstrating how to set a Playwright proxy in Python, using HTTPbin as the target website and grabbing a free proxy from FreeProxyList.

                    scraper.py
                
from playwright.async_api import async_playwright
import asyncio
 
async def main():
    async with async_playwright() as playwright:
        browser = await playwright.chromium.launch(
             proxy={
               'server': "181.129.43.3:8080",
               },
        )
        context = await browser.new_context()
        page = await context.new_page()
 
        await page.goto("https://httpbin.org/ip")
        html_content = await page.content()
        print(html_content)
 
        await context.close()
        await browser.close()
 
asyncio.run(main())

  
  

  
Copied!

While the code snippet above shows the use of a free proxy, it's important to highlight that free proxies aren't reliable and rarely work in real-world use cases. Therefore, premium proxies become crucial for yielding optimal results.

With anti-bot systems continuously evolving, rotating your premium proxies is essential to overcome rate limits and IP bans.

That seems like a lot, right?

Well, it doesn't have to be. ZenRows, a web scraping API, offers premium proxies and rotates them automatically under the hood. It can complement or replace Playwright because it provides the same headless browser functionality with everything you need to scrape without getting blocked.

To try ZenRows for free, sign up, and you'll get redirected to the Request Builder page.

building a scraper with zenrows — Click to open the image in full screen

Paste your target URL, check the box for Premium Proxies to implement proxies, and activate the JavaScript Rendering boost mode if needed.

Then, select a language (Python), and you’ll get your script ready to try. Run it, and you'll get access to the HTML content of your target website.

                    scraper.py
                
# pip install requests
import requests
 
url = "https://httpbin.io/anything"
proxy = "http://<YOUR_ZENROWS_API_KEY>:js_render=true&[email protected]:8001"
proxies = {"http": proxy, "https": proxy}
response = requests.get(url, proxies=proxies, verify=False)
print(response.text)

Copied!

Awesome right? ZenRows makes scraping easy.

2. Customize Your User Agent

The User-Agent (UA) string is a crucial component of the HTTP headers sent with every request. It serves as a bellman, informing the target server about the operating system, browser type, and other relevant details of the requesting web client.

Websites leverage this information to tailor their responses and features. So, if you carry a bot-like UA string, you'll likely encounter the Playwright 403 error.

However, emulating the UA of popular browsers like Chrome or Firefox can increase your chances of avoiding detection.

To customize a Playwright User Agent, set the user agent property in the browser context. Below is a quick example showing how to change Playwright's' UA to that of a Firefox browser in JavaScript.

                    scraper.py
                
const { chromium } = require(“playwright”);
 
(async () => {
  // Launch the Chromium browser
  const browser = await chromium.launch();
 
  const context = await browser.newContext({
    userAgent: “Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/109.0”,
  });
  
  // Create a new page in the browser context and navigate to target URL
  const page = await context.newPage();
  await page.goto(“https://httpbin.io/user-agent”);
  
  // Get the entire page content
  const pageContent = await page.content();
  console.log(pageContent);
 
  // Close the browser
  await browser.close();
})();

  
  

  
Copied!

3. Optimize Your Request Frequency

Websites often implement request rate limits to regulate traffic and control the maximum requests a client can make within specific time frames - for example, Cloudflare may return error 1015 when you exceed rate limits. Thus, to avoid triggering anti-scraping mechanisms, you must optimize the frequency of your requests.

One way to achieve this is to include delays between requests. Pauses before every other request simulates human-like browsing behavior and ultimately reduces the risk of detection.

Below is a basic code snippet showing how to introduce delays between requests using page.waitForTimeout in JavaScript.

                    scraper.py
                
const { chromium } = require(“playwright”);
 
(async () => {
    // Launch a browser
    const browser = await chromium.launch();
 
    // Create a new browser context
    const context = await browser.newContext();
 
    // Create a new page within the context
    const page = await context.newPage();
 
    try {
        // List of URLs to scrape
        const urls = [“https://example.com”, “https://example.com/page2”, /* Add more URLs */];
 
        // Loop through the URLs
        for (const url of urls) {
            // Navigate to url
            await page.goto(url);
 
            // Introduce a delay before the next request
            await page.waitForTimeout(2000);
        }
 
    } finally {
        // Close the browser
        await browser.close();
    }
})();

  
  

  
Copied!

4. Leverage a Web Scraping API to Never Get Blocked

The easiest way to avoid the Playwright 403 error is to use a web scraping API. Solutions like ZenRows provide all the features necessary to scrape without getting blocked, including headless browser functionality, geolocation, and advanced anti-bot bypass.

Let's see how ZenRows works with a Cloudflare-protected page as the target URL.

Input the target URL (in this case, https://www.g2.com/products/asana/reviews), activate JavaScript rendering, and the add-on Premium Proxies.

That'll generate your request code on the right. Copy it to run in your terminal.

Your code should look like this:

                    Terminal
                
curl "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fasana%2Freviews&js_render=true&premium_proxy=true"

Copied!

Run it, and you'll get the following result.

                    Output
                
<!DOCTYPE html><head>
 
#...
<title>Asana Reviews 2023: Details, Pricing, &amp; Features | G2</title>
#..

Copied!

Congrats! You've successfully avoided detection using ZenRows.

5. Use Playwright Stealth Extension

Playwright Stealth is a plugin that aims to extend Playwright functionality with the ability to avoid triggering anti-bot measures. This extension is transplanted from Puppeteer extra plugin stealth via the Playwright Extra library, an open-source tool that enables the use of plugins with Playwright.

The stealth extension applies various techniques to mask Playwright's automation properties, enabling you to fly under the radar. These techniques are built-in evasion modules that plug specific "property leaks". For example; Media.codecs modifies codecs to support what Chrome uses Navigator.plugin emulates navigator.mimeTypes and navigator.plugins with functional mocks to match standard Chrome used by humans.

However, while Playwright Stealth is a powerful tool, it isn't foolproof and doesn't work against advanced anti-bot protection.

Conclusion

The 403 error is a common occurrence when scraping with Playwright. If the error persists after trying techniques like proxies, user agent spoofing, or Playwright Stealth, consider ZenRows for guaranteed results.