The Anti-bot Solution to Scrape Everything? Get Your Free API Key! 😎

How to bypass Akamai

Picture of Ander
By Ander Β· August 19, 2022 Β· 11 min read Β· Twitter
Ander is a web developer who has worked at startups for 12+ years. He began scraping social media even before influencers were a thing. Geek to the core.

Web scraping is an old and still common technique to extract data. Akamai Bot Manager, and other vendors, try to mitigate the problems that scraping might cause. Their goal is to stop attacks such as DDoS or fraudulent authentication attempts. Our goal, for educational purposes, is to bypass Akamai.

Scrapers aren't their main target, but might get their content blocked anyways. Differentiating good and bad bots is not an easy task. Let's see what and how they do it and learn how to bypass Akamai Bot Manager!

What is a bot detection software

Bot detection, also known as Web Application Firewall (WAF) or anti-scraping protection, is a group of techniques to classify and detect bots.

In the past, the measures involved detecting high-load IPs and checking headers. That allowed the defensive system to block most of the scraping traffic.

As those techniques evolved, so did the scraping world. Many scrapers tried to bypass those measures, causing the Bot-Detection industry to get better.

Years later, anti-scraping software started to include passive and active approaches. Passive as in storing IPs and botnets and identifying them for each request. Active as in monitoring the browser and the user activity, or feeding browsing history to machine learning programs.

According to an official white paper (page 13), Akamai was already doing Behavioral Analysis in 2016. They've been evolving since then, as do all the Bot-Detection software vendors. They now offer a bot scoring model for the site owner to fine-tune their blocking aggressiveness.

What does Akamai Bot Manager do?

Akamai Bot Manager's primary goal is to stop the most dangerous, evasive bots before they erode customer trust. That covers a wide variety of actions, web scraping included.

To achieve their goal, they use a mix of actions. They maintain a bot directory, apply AI to detect new kinds of bots and, since they are present in many massive-traffic sites, they gain collective intelligence.

All this security also applies to more aggressive attacks: DDoS, fake account creation or replicating sites with phishing intent. Telling apart the purpose is a crucial aspect.

The two critical aspects of bot management are:
  1. Distinguish humans from bots.
  2. Differentiate good bots from bad bots.

Not all bots are malicious. Nobody wants to leave Google out, and they crawl and scrape too. So, how do they do it?

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

How does Akamai Bot Manager work?

As mentioned above, they use a wide variety of techniques. We'll explain some of them before we dive into the technical implementation. And learn how to bypass them, of course.

These are the moving parts to bypass Akamai Bot Manager:
  • Botnets. Maintain historical data on known bots and feed their system. The same range of IPs, common mistakes in user-agent naming, or similar patterns. Any of these might give away a botnet. Once identified and recorded, blocking that pattern feels safe. No humans would browse like that.
  • IPs. Blocking IPs might sound like the easiest approach. It's not that simple. IPs don't change ownership often, but who uses them does. And attackers might use IPs that belong to common ISPs, masking their origin and intent. A regular user might get that IP address a week later. By then, it should access the content without blocks.
  • CAPTCHAs. The best procedure for telling humans from bots, right? Again, a polemic one. By using CAPTCHAs in every session, a site would drive several users away. Nobody wants to solve them every other request. As a defensive tactic, CAPTCHAs will appear only if the traffic is suspicious.
  • Browser and sensor data. Inject a javascript file that will watch and run some checks. Then send all the data to the server for processing. We will see later how it's done and what data to send.
  • Browser fingerprinting. As with the botnets, common structures or patterns can give you away. Scrapers might change and mask their requests. But some details (browser, or hardware-related) are hard to hide. And Akamai will take advantage of that.
  • Behavioral Analysis. Compare historical user behavior on the sites. Checks patterns and common actions. For example, users can visit products directly from time to time. But if they never go to a category or search page, it might trigger an alert.

Scraping detection is not black or white. Akamai Bot Manager combines all the above β€” and several others. Then, based on the site's setup, it will decide whether to block a user. Bot detection services use a mix of server-side and browser-side detection techniques.

If we want to skip Akamai Bot Manager, we have to first understand how it uses them. Or else face the "Access Denied Page":

Akamai Access Denied
Ever got a request blocked? Stay with us to avoid Akamai's blocks.

We can only guess how Akamai does the server side-detection. But we can take a look at their client side.

It will get technical from here. Brace yourselves!

Akamai's Javascript challenge explained

How Akamai Pixel Data Initiator

Click to open the image in full screen

As we can see in the image above, the script triggers a POST request with a huge payload. Understanding this payload is crucial if we want to bypass Akamai Bot Detection. It won't be easy.

Deobfuscate the challenge

You can download the file here. To see it live, visit KICKZ and look for the file on DevTools. You won't understand a thing, don't worry… that's the idea!

First, run the content on a JavaScript Deobfuscator. That will convert the weird characters into strings. Then, we need to replace the references to the initial array with those strings.

They don't declare variables or object keys with a straight name to make things harder. They use indirection: referencing that array with the corresponding index.

We haven't found an online tool that nails the replacement process. But you can do the following:
  1. Cut the _acxj variable from the generated code.
  2. Create a file and place that variable.
  3. Then the rest of the code in another variable.
  4. Replace (not perfect) all references to the array, see the code below.
  5. Review since some of them will fail.
var _acxj = ['csh', 'RealPlayer Version Plugin', 'then', /* ... */]; 
const code = `var _cf = _cf || [], ...`; 
const result = code 
	.replace(/\[_acxj\[(\d+)\]\]/g, (_, i) => `.${_acxj[i]}`) 
	.replace(/_acxj\[(\d+)\]/g, (_, i) => JSON.stringify(_acxj[i]));

It will need some manual adjustments since it's a clumsy attempt. A proper replacement would need more details and exceptions. Download our final version to see how it looks.

To save you time, we've done that. The original file changes frequently. This result might not be the same now. But it will help you understand what data and how they send id to the server.

Akamai's sensor data

Akamai Sensor Data
Click to open the image in full screen

You can see above the data sent for processing. We can take as examples the items highlighted in red. From its content, we can guess where the first two come from: user agent and screen size. The third one looks like a JSON object, but we cannot know what it represents just by its keys. But let's find out!

The first key β€” cpen β€” is present in the script. A quick find on the file will tell us so. So wthe line that references it.

var t = [], 
	a = window.callPhantom ? 1 : 0; 
t.push(",cpen:" + a);

What does it mean? The script checks if callPhantom exists. A quick search on Google tells us that it's a feature that PhantomJS introduces. Meaning that sending cpen:1 is probably an alert for Akamai. No legit browser implements that function.

If you check the next lines, you'll see that they keep sending browser data. window.opera, for example, should never be true if the browser is not Opera. Or mozInnerScreenY only exist on Firefox browsers. Do you see a pattern? No single data point is a deal breaker (well, maybe the PhantomJS one), but they reveal a lot when analyzed together!

The function called bd generates all these data points. If we look for its usage, we arrive at a line with many variables concatenated. n + "," + o + "," + m + "," + r + "," + c + "," + i + "," + b + "," + bmak.bd(). Believe it or not, but o is the screen's available height.

How can we know that? Go to the definition of the variable. Control + click or similar on an IDE will take you there.

The definition itself doesn't tell us anything useful: o = -1. But look a few lines below:

try { 
	o = window.screen ? window.screen.availHeight : -1 
} catch (t) { 
	o = -1 
}

And there you have it! You followed what and how Akamai sends browser/sensor data for backend processing.

We won't be covering all the items, but you get the idea. Apply the same process for any data point you are interested in.

But the most crucial question is: why are we doing this? πŸ€”

To bypass Akamai's defenses, we must understand how they do it. Then, check what data they use for that. And with that knowledge, find ways to access the page without blocks.

Mask your sensor data

If all your machines have similar data sent, Akamai might fingerprint it. Meaning that they detect and group them. Same browser vendor, screen size, processing times, browser data. Is there a pattern? Check your data. They are already doing it.

Assuming that, how to avoid it? There are several ways to mask them, such as Puppeteer stealth.

And how do they do it? It's open source, and we can take a look at the evasions!

There are no evasions for availHeight, so we'll switch to hardwareConcurrency. We picked a simple one for simplicity. Most evasions are more complicated.

Let's say that all your production machines are the same. It's usual β€” same specs, hardware, software, etc. Their concurrency would be the same, for example, "hardwareConcurrency": 4.

It's just a drop in the ocean. But remember that Akamai Bot Manager processes hundreds of data points. We can make it harder for them by switching some.

// Somewhere in your config. 
// There should be a helper function called `sample`. 
const options = {hardwareConcurrency: sample([4, 6, 8, 16])}; 
 
// The evasion itself. 
// Proxy navigator.hardwareConcurrency getter and return a custom value. 
utils.replaceGetterWithProxy( 
	Object.getPrototypeOf(navigator), 
	'hardwareConcurrency', 
	utils.makeHandler().getterValue(options.hardwareConcurrency) 
)

A proxy acts as an intermediary. In this case, for the function hardwareConcurrency on the object navigator. When called, instead of returning the original, it will replace it with the one we set in the options. It can be, for example, a random number from a list of typical values.

What do we get with this approach? Akamai would see different values for hardwareConcurrency sent randomly. Assuming that we do it for several parameters, it's hard to see a pattern.

Isn't this a complicated process for Akamai to run on each visit? The good part for everyone is that they do it only once. Then sets cookies to avoid running all the processes again.

Cookies to avoid continuous challenges

Why is it good for you? Once obtained the cookies, the next requests should go unchecked. It means that those cookies will bypass Akamai WAF!

We suggest using the same IP to simulate an actual user session for security.

The standard cookies used by Akamai are _abck, ak_bmsc, bm_sv, and bm_mi. It's not easy to find information about these. Thanks to cookie policies, some sites list and explain them.

Akamai Cookies
Click to open the image in full screen

Notice that ak_bmsc is HTTP only. That means that you cannot access its content from Javascript. You will need to check the response headers on the sensor data call. For the others, you can check the headers or call document.cookie on the browser.

Akamai _abck Cookie Content
Click to open the image in full screen
And that cookie content is critical! The sensor call will allow - or not - your request and generate that cookie for your session. Once obtained, send it every time to avoid new checks.

Wow, that was a lot to take on.

It's Let's now see how to skip Akamai WAF by applying the points we just learned.

How to bypass Akamai Bot Manager

Here are some typical practices that will help when scrapping Akamai protected sites, and we're going to bypass all. The threshold is not the same for them all. Some might be more aggressive. We'll talk about them later.
  1. Follow robots.txt
  2. Good Rotating Proxies
  3. Use headless browsers
  4. HTTP headers
  5. Stealth techniques for headless browsers
  6. Do not contradict the Javascript challenge

Follow robots.txt

It might sound obvious and simple. And it is. But scraping content blocked by robots.txt might flag your scrapper. And, if Akamai finds a pattern, all your requests might get blocked after some time. Remember that Akamai Bot Manager learns from the visitors and their actions.

Be extremely careful with honeypots. These are pages designed to lure malicious bots. For example, through invisible links that a human would never follow. They can flag you or slow down your process.

Good Rotating Proxies

A good IP address might not bypass Akamai defenses, but a bad IP will automatically be rejected. The same goes for proxies that don't change their IP. After some requests, its Bot Manager will block it. It's just a matter of time.

Good proxies have two properties (non-exclusive):
  • Rotating. They change IPs each request or after a short time.
  • Origin. It determines where those IPs come from. Akamai Bot Manager will have a hard time blocking an IP used by an ordinary 4G provider. Hard to tell a bot from a human. Usual examples are data center or residential.

Use headless browsers

As we've seen, passing the Javascript challenge is a must. And you can't pass it without a browser. Static scraping β€” curl, axios in js, or requests in python β€” won't work in the long term. Some sites might work a few times but not reliably.

Enter headless browsers, for example, Selenium. They execute a real browser, like Chrome, without the graphical interface. It means that they will download and run the Akamai challenge.

Combined with good proxies, it will look like an actual user. Still, bot detection software can tell them apart.

HTTP headers

Browsers send a set of headers by default, be careful when changing them. If you add the same headers for Chrome and Safari, Akamai can tell that something is wrong.

Change only the least you need. And always check that the whole set makes sense. Let's say you want to add Google as a referrer. You could add referer: https://www.google.com/.

Simple, right? Well, not so fast. By default, Chrome would send sec-fetch-site: none. But if coming from Google, the browser will add sec-fetch-site: cross-site.

A small detail, yes. But something that a real browser would never fail to do.

Stealth techniques for headless browsers

We already saw how these work above. These practices consist of modifying how the browser responds to specific challenges. Akamai's detection software will query the browser, hardware, and many other things. To avoid patterns and fingerprinting, the stealth mode overwrites some of them. It adds variation and masks the browser.

Some of these actions are detectable. You will have to test and check for the right combination. Each site can have a different configuration.

The good thing is that you already understand how Akamai decides which requests to block. Or at least you know what data they gather for that.

Do not contradict the Javascript challenge

Again, it might sound obvious, but some data points might be derived from another one or duplicated. If you decide to mask one and forget the other, the Akamai bot detection will see it. And take action, or at least mark it internally.

In the sensor data example image, we can see that it sends window size. Most data points are related: actual screen, available, inner, and outer sizes.

Inner, for example, should never be bigger than outer. Random values will not work here. You'd need a set of actual sizes.

The easiest way to bypass Akamai

Sometimes, it's a good plan to let others do the heavy lifting. ZenRows is designed to bypass Akamai Bot Manager or any other anti-scraping system. It allows you to scrape content without worrying about skipping Akamai or others on your own. ZenRows offers both API and Proxy modes, choose the best fit for you.

Focus on building your data extraction system. Get the most out of your data. And forget about handling all the complicated parts we talked about.

Conclusion

It's been quite a ride! We know it went deep, but bot detection is a complex topic.

The main points to bypass Akamai bot detection or other defensive systems are:
  • Good proxies with fresh and rotating IPs.
  • Follow robots.txt.
  • Use headless browsers with stealth mode.
  • Understand Akamai's challenges, so that you can adapt the evasions.

For updated info, you can check their site. It explains, not in depth, what aspects they cover and some general information. Not much, but it can guide you if anything changes.

If you liked it, you might be interested in our article about how to bypass Cloudflare.

Thanks for reading! We hope that you found this guide helpful. You can sign up for free, try ZenRows, and let us know any questions, comments, or suggestions.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.