Are you web scraping with Selenium but getting blocked by Akamai's anti-bot? You've come to the right place!
In this guide, you'll learn how Akamai's detection system works and the four best methods to bypass it.
What Is Akamai and How It Works
Akamai is a cloud solution with web security services that protect websites from cyber threats, including DDoS and DNS attacks, account takeovers, brand impersonations, and more. It also prevents web scraping activities from automated headless browsers like Selenium.
Akamai gathers thousands of bot-like attributes from many websites. It then analyzes this data with machine learning algorithms to differentiate between humans and bots more efficiently.
The intuitiveness of this security system makes bypassing Akamai difficult during web scraping.
Why Base Selenium Alone Is Not Enough to Bypass Akamai
Selenium is a browser automation tool for testing web applications and scraping web content. Its ability to automate user actions using JavaScript makes it an excellent choice for extracting data from dynamic web pages.
However, scraping with Selenium alone is insufficient when dealing with Akamai-protected websites because it contains easily detectable bot-like attributes like the WebDriver.
For instance, Selenium won't bypass an Akamai-protected website like the following Similarweb comparison page:
Try it out with the following Python code:
# import the required libraries
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
# start Chrome in headless mode
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)
# visit the target website
driver.get("https://www.similarweb.com/website/facebook.com/")
# print the page HTML
print(driver.page_source)
# quit the browser
driver.quit()
Selenium got blocked with an "Access Denied" message, as shown:
<html>
<head>
<title>Access Denied</title>
</head>
<body>
<h1>Access Denied</h1>
<p>You don't have permission to access "http://www.similarweb.com/website/facebook.com/" on this server.</p>
<p>Reference #18.c56e5668.1713516274.184af10d</p>
<p>https://errors.edgesuite.net/18.c56e5668.1713516274.184af10d</p>
</body>
</html>
Next, let's see the best methods to beat the Akamai challenge.
Best Methods to Bypass Akamai with Selenium
Akamai's anti-bot measures are advanced, requiring more than just Selenium to bypass. Here are the top solutions to evade Akamai.
Method #1: Use a Web Scraping API
Web scraping APIs are scraping tools for bypassing challenges, including CAPTCHAs, IP bans, and other anti-bot protections during content extraction. ZenRows is a leading web scraping API with an all-in-one solution for extracting data from any web page, including Akamai-protected websites.
It features auto-rotating premium proxies, optimized request headers and bypasses CAPTCHAs and any other form of anti-bot measure during scraping.
ZenRows acts as a headless browser for scraping dynamic content with JavaScript instructions. This feature allows you to replace Selenium with ZenRows without worrying about extra memory usage due to browser instances.
Let's use ZenRows to scrape the Akamai-protected website that blocked you previously to see how it works.
Sign up to open the ZenRows Request Builder. Paste the target URL in the link box, toggle the Boost mode to JS Rendering, and activate Premium Proxies. Choose Python as your programming language and select the API connection mode. Copy and paste the generated code into your Python script:
The generated code uses the Requests library as the HTTP client. So, ensure you install it using pip
:
pip install requests
A slightly modified version of the generated code looks like this:
# pip install requests
import requests
# define your request parameters
params = {
"url": "https://www.similarweb.com/website/facebook.com/",
"apikey": "<YOUR_ZENROWS_API_KEY>",
"js_render": "true",
"premium_proxy": "true",
}
# get the response and print the extracted HTML
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)
The above code bypasses Akamai and outputs the full-page HTML. Here's a prettified version of the extracted HTML with some omitted content:
<html lang="en">
<head>
<meta charset="utf-8">
<!-- -->
<title>facebook.com Traffic Analytics, Ranking & Audience [March 2024] | Similarweb</title>
<!-- -->
</head>
<body class="app-banner-parent">
<!-- -->
</body>
</html>
That's the most straightforward solution to bypass Akamai's anti-bot measures without technical configurations. Let's explore the manual options.
Method #2: Undetected ChromeDriver Plugin
The Undetected ChromeDriver is a Selenium plugin with ChromeDriver patches for evading anti-bot systems. The plugin removes anti-bot signals like the WebDriver from Selenium, ensuring anti-bot systems don't flag it as a bot during web scraping.
Although the plugin can increase your chances of bypassing anti-bots like Akamai, it may not handle Akamai's advanced protection approaches like device fingerprinting and machine learning detection.
Check out our tutorial on how the Undetected ChromeDriver works to learn more.
Method #3: Utilize Premium Proxies
A proxy server routes your request through another IP address so the server thinks you're requesting from a different location. You can mask your IP address with free proxies. However, those are short-lived and unreliable.
The best types are premium residential web scraping proxies requiring authentication credentials like passwords and usernames. Most of these premium services offer an auto-rotation feature to switch IP addresses per request. So, the server handles each request as if it's from a different user.
Check our guide on setting up a proxy with Selenium for a more detailed tutorial.
Method #4: Change Your Request Headers
The request headers describe the source of an HTTP request and influence how the server will handle your request.
Inconsistencies in the request header values can trigger Akamai's anti-bot system because it can easily detect deviations from legitimate browser headers. Selenium's request headers contain bot-like information like the `HeadlessChrome` in the User Agent string.
You can configure Selenium's request headers like the User Agent to mimic a legitimate user. See our guide on changing the Selenium User Agent to learn more.
Conclusion
In this article, you've learned the four best methods of bypassing Akamai with Selenium. While the web scraping API option is the best solution, other methods, including the Undetected ChromeDriver, premium proxy setup, and request header configuration, can also work, especially when combined.
Since a web scraping API is the ultimate solution, we recommend using ZenRows to handle all setups required to bypass Akamai and other anti-bots. It allows you to focus on scraping the desired content without getting blocked. Try ZenRows for free!