What Is Playwright 403?
The Playwright 403 error is related to the HTTP Status Code 403: Forbidden
, which is common in web scraping. It implies that the server understood the request but refused to authorize it.
Beyond scraping, Playwright is widely used for automated testing, web application automation, and browser UI automation. When a Playwright script encounters a 403 error, it typically receives a response like the one below.
HTTPError: 403 Client Error: Forbidden for url: https://www.g2.com/products/asana/reviews
For actionable steps on how to overcome this error, read on.
How to Fix 403 Forbidden in Playwright
Below are five proven techniques that can help you overcome the Playwright 403 error.
- Avoid IP bans with proxies.
- Customize your user agent.
- Optimize your request frequency.
- Leverage a web scraping API to never get blocked.
- Use Playwright Stealth extension.
While the Playwright 403 error can occur in any use case, these techniques will focus on fixing the error when web scraping.
1. Avoid IP Bans With Proxies
Proxies are servers that act as a bridge between your web scraper and the target website. When web scraping, a Playwright proxy enables you to route your requests through a different IP address, allowing you to retrieve data anonymously. This can also help avoid rate limiting, one cause of the Playwright 403 error.
To implement a proxy with Playwright, launch the browser, passing your proxy credentials as separate parameters.
Below is a basic example demonstrating how to set a Playwright proxy in Python, using HTTPbin as the target website and grabbing a free proxy from FreeProxyList.
from playwright.async_api import async_playwright
import asyncio
async def main():
async with async_playwright() as playwright:
browser = await playwright.chromium.launch(
proxy={
'server': "181.129.43.3:8080",
},
)
context = await browser.new_context()
page = await context.new_page()
await page.goto("https://httpbin.org/ip")
html_content = await page.content()
print(html_content)
await context.close()
await browser.close()
asyncio.run(main())
While the code snippet above shows the use of a free proxy, it's important to highlight that free proxies aren't reliable and rarely work in real-world use cases. Therefore, premium proxies become crucial for yielding optimal results.
With anti-bot systems continuously evolving, rotating your premium proxies is essential to overcome rate limits and IP bans.
That seems like a lot, right?
Well, it doesn't have to be. ZenRows, a web scraping API, offers premium proxies and rotates them automatically under the hood. It can complement or replace Playwright because it provides the same headless browser functionality with everything you need to scrape without getting blocked.
To try ZenRows for free, sign up, and you'll get redirected to the Request Builder page.
Paste your target URL, check the box for Premium Proxies
to implement proxies, and activate the JavaScript Rendering
boost mode if needed.
Then, select a language (Python), and you’ll get your script ready to try. Run it, and you'll get access to the HTML content of your target website.
# pip install requests
import requests
url = "https://httpbin.io/anything"
proxy = "http://<YOUR_ZENROWS_API_KEY>:js_render=true&[email protected]:8001"
proxies = {"http": proxy, "https": proxy}
response = requests.get(url, proxies=proxies, verify=False)
print(response.text)
Awesome right? ZenRows makes scraping easy.
2. Customize Your User Agent
The User-Agent (UA) string is a crucial component of the HTTP headers sent with every request. It serves as a bellman, informing the target server about the operating system, browser type, and other relevant details of the requesting web client.
Websites leverage this information to tailor their responses and features. So, if you carry a bot-like UA string, you'll likely encounter the Playwright 403 error.
However, emulating the UA of popular browsers like Chrome or Firefox can increase your chances of avoiding detection.
To customize a Playwright User Agent, set the user agent
property in the browser context. Below is a quick example showing how to change Playwright's' UA to that of a Firefox browser in JavaScript.
const { chromium } = require(“playwright”);
(async () => {
// Launch the Chromium browser
const browser = await chromium.launch();
const context = await browser.newContext({
userAgent: “Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/109.0”,
});
// Create a new page in the browser context and navigate to target URL
const page = await context.newPage();
await page.goto(“https://httpbin.io/user-agent”);
// Get the entire page content
const pageContent = await page.content();
console.log(pageContent);
// Close the browser
await browser.close();
})();
3. Optimize Your Request Frequency
Websites often implement request rate limits to regulate traffic and control the maximum requests a client can make within specific time frames. Thus, to avoid triggering anti-scraping mechanisms, you must optimize the frequency of your requests.
One way to achieve this is to include delays between requests. Pauses before every other request simulates human-like browsing behavior and ultimately reduces the risk of detection.
Below is a basic code snippet showing how to introduce delays between requests using page.waitForTimeout
in JavaScript.
const { chromium } = require(“playwright”);
(async () => {
// Launch a browser
const browser = await chromium.launch();
// Create a new browser context
const context = await browser.newContext();
// Create a new page within the context
const page = await context.newPage();
try {
// List of URLs to scrape
const urls = [“https://example.com”, “https://example.com/page2”, /* Add more URLs */];
// Loop through the URLs
for (const url of urls) {
// Navigate to url
await page.goto(url);
// Introduce a delay before the next request
await page.waitForTimeout(2000);
}
} finally {
// Close the browser
await browser.close();
}
})();
4. Leverage a Web Scraping API to Never Get Blocked
The easiest way to avoid the Playwright 403 error is to use a web scraping API. Solutions like ZenRows provide all the features necessary to scrape without getting blocked, including headless browser functionality, geolocation, and advanced anti-bot bypass.
Let's see how ZenRows works with a Cloudflare-protected page as the target URL.
Sign up for free, and you'll get to the Request Builder page:
Input the target URL (in this case, https://www.g2.com/products/asana/reviews), activate JavaScript rendering, and the add-on Premium Proxies.
That'll generate your request code on the right. Copy it to run in your terminal.
Your code should look like this:
curl "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fasana%2Freviews&js_render=true&premium_proxy=true"
Run it, and you'll get the following result.
<!DOCTYPE html><head>
#...
<title>Asana Reviews 2023: Details, Pricing, & Features | G2</title>
#..
Congrats! You've successfully avoided detection using ZenRows.
5. Use Playwright Stealth Extension
Playwright Stealth is a plugin that aims to extend Playwright functionality with the ability to avoid triggering anti-bot measures. This extension is transplanted from Puppeteer extra plugin stealth via the Playwright Extra library, an open-source tool that enables the use of plugins with Playwright.
The stealth extension applies various techniques to mask Playwright's automation properties, enabling you to fly under the radar. These techniques are built-in evasion modules that plug specific "property leaks". For example;
Media.codecs
modifies codecs to support what Chrome uses
Navigator.plugin
emulates navigator.mimeTypes
and navigator.plugins
with functional mocks to match standard Chrome used by humans.
However, while Playwright Stealth is a powerful tool, it isn't foolproof and doesn't work against advanced anti-bot protection.
Conclusion
The 403 error is a common occurrence when scraping with Playwright. If the error persists after trying techniques like proxies, user agent spoofing, or Playwright Stealth, consider ZenRows for guaranteed results.
Sign up now to try ZenRows for free.