How to Bypass Akamai With Puppeteer

June 7, 2024 · 8 min read

Web scraping with Puppeteer can be challenging, especially when running into Akamai's defenses. With its advanced anti-bot system and strict Web Application Firewall (WAF), Akamai poses a serious challenge for any web scraper.

Don't give up! In this guide, you'll learn how to bypass Akamai when web scraping with Puppeteer.

What Is Akamai?

Akamai is a cloud-based service that many websites use to improve performance and avoid malicious attacks. Its WAF is designed to protect web servers from threats like zero-day attacks, distributed denial-of-service (DDoS) attacks, SQL injection, and practically any traffic that isn't human.

Unfortunately, that includes your scraping bot, regardless of the legitimacy of your intentions. So, let's begin the process of bypassing Akamai with Puppeteer by taking a step back to understand how the solution works.

How Does Akamai Work?

When a user requests content from an Akamai-protected website, Akamai's edge server intercepts the request. It analyzes its legitimacy before relaying the request to the source server.

However, scrapers do not make it past the analysis stage due to various detection techniques embedded in Akamai's continuous security testing tools and real-time monitoring automation. These include rate control analysis, reputation analysis, JavaScript challenges, or behavioral analysis.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Why Puppeteer Alone Is Not Enough to Bypass Akamai

Puppeteer is a Node JS library with a high-level API for controlling headless Chrome or Chromium browsers. It enables you to automate web interactions such as navigating pages, clicking, filling forms, and even rendering JavaScript, all within a headless environment.

Although web scraping with Puppeteer can be beneficial, its capabilities alone aren't enough to scrape Akamai-protected websites. This is because the anti-bot solution can easily detect Puppeteer's automation properties.

As mentioned, Akamai employs various behavioral analysis techniques to identify bot traffic. While Puppeteer allows you to automate user interaction, those interactions may not accurately mimic human behavior, making it easy for Akamai to detect.

Also, Puppeteer's already established reputation as an automated browser makes it an easy target for Akamai's IP reputation analysis.

See for yourself. Try to scrape an Akamai-protected webpage (https://www.kickz.com/de) using Puppeteer:

sample.js
import puppeteer from 'puppeteer';
 
(async () => {
  // launch the browser and open a new blank page
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
 
  // navigate the page to a URL
  await page.goto('https://www.kickz.com/de');
 
  // set screen size
  await page.setViewport({width: 1080, height: 1024});
 
  // take screenshot  
  await page.screenshot({ path: 'kickz.png' });
 
  await browser.close();
})();

You'll end up with the following result:

result
Click to open the image in full screen

This shows that Puppeteer was denied access. Akamai could easily detect that the request came from an automation tool.

To overcome this obstacle, you must explore additional strategies and tools to complement Puppeteer's capabilities.

Below are the best strategies to yield your desired results.

Best Methods to Bypass Akamai With Puppeteer

The only way to bypass Akamai is by emulating natural browsing behavior. Let's learn how you can achieve this.

Method #1: Web Scraping API (Most Effective Option)

Web scraping APIs are the most effective option as they make it easy to extract data from websites using an API call. They provide an interface for sending requests to a server and retrieving the desired data in your preferred format.

The best ones, like ZenRows, handle the technical aspect of emulating natural user behavior**, including rotating premium proxies, anti-Captchas, optimized headers, and more.

Just like Puppeteer, ZenRows offers headless browser functionality but is much easier to use and scale. You only need to make an API call to render JavaScript, bypass Akamai, and access your desired data.

Let's see how ZenRows performs with the same webpage we tried to scrape earlier.

To get started, sign up to ZenRows for free, and you'll be directed to the Request Builder page.

Paste your target URL (https://www.kickz.com/de), select the JavaScript Rendering mode, and check the box for Premium Proxies to rotate proxies automatically. Select the language of your choice, and it'll generate your request code on the right. (This example uses Python).

ZenRows Request Builder
Click to open the image in full screen

Although this code uses Python Requests, you can use any HTTP client. You only need to make your requests to the ZenRows API.

Copy the generated code to your favorite editor. Your new script should look like this:

sample.py
# pip install requests
import requests
 
url = 'https://www.kickz.com/de'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
    'url': url,
    'apikey': apikey,
    'js_render': 'true',
    'premium_proxy': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)

Run it, and you'll get the page's HTML content.

Output
<title>
    KICKZ.COM. Der Online Shop für Streetwear, Sneaker und Basketball Gear
</title>

Impressive, isn't it? ZenRows simplifies the process of bypassing Akamai.

Method #2: Puppeteer-extra-plugin-stealth

Puppeteer-extra-plugin-stealth is a plugin that extends Puppeteer's capabilities. It enables the headless browser to hide properties that would otherwise flag you as a bot, making it harder for websites to detect and block Puppeteer's activities.

The plugin uses various evasion modules to modify browser fingerprints and behaviors and make them mimic a real human user.

Check out this Puppeteer stealth tutorial to learn more.

Method #3: Use Premium Proxies

Premium proxies can be pivotal in navigating Akamai's security systems. They act as intermediaries between your Puppeteer Scraper and the target server. By routing your request through them, you can mask your identity and location.

Also, it's important to rotate between multiple proxies, as this allows you to distribute traffic across different IP addresses and appear as if your requests originate from different users or devices.

There are different types of proxies, including residential and data center proxies. Residential proxies are often recommended, as their IP addresses are assigned to actual devices. By integrating them with Puppeteer, you can avoid detection.

Check out this guide on using a Puppeteer proxy to learn more about implementing proxies in Puppeteer.

Method #4: Optimize Your Request Headers

Headers are additional information sent along with HTTP requests between a client and a server. They provide metadata about the request, including details such as the user agent, content type, encoding, and more. Websites often use this information to tailor responses.

So, when Puppeteer sends bot-like default headers that lack many header strings generated by an actual browser, Akamai easily detects and blocks it. However, you can optimize your headers to mimic a regular browser and ultimately avoid detection.

Check out this guide on setting Puppeteer headers to learn how to optimize your headers fully.

Conclusion

Akamai presents a formidable challenge for any web scraping task. By using the methods discussed in this tutorial, you can bypass Akamai and retrieve your desired data.

However, you should know that proxies, request headers, and puppeteer stealth aren't foolproof. To guarantee you can bypass Akamai, use ZenRows, a web scraping API that provides the complete toolkit to scrape without getting blocked.

Ready to get started?

Up to 1,000 URLs for free are waiting for you