The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ๐Ÿ˜Ž

FlareSolverr Tutorial: Scrape Cloudflare Sites

May 4, 2023 ยท 9 min read

Cloudflare is a popular web security service protecting many websites against unwanted traffic, such as DDoS attacks. Its advanced anti-bot system analyzes incoming requests and uses evolving algorithms to detect and block bots. That's why your web scraper is getting the "Access Denied" error.

Fortunately, you'll learn how to bypass Cloudflare using FlareSolverr in this article.

Let's get started!

What Is FlareSolverr

FlareSolverr is an open-source proxy server designed to bypass Cloudflare's anti-bot mechanisms. It emulates an actual browser that can solve challenges, pass security checks, and render website content.

Usually, when you visit a Cloudflare-protected website, it keeps you in the waiting room to solve numerous fingerprinting challenges, CAPTCHAs, or other tests to prove you're human. FlareSolverr automates the bypassing process using Python Selenium and Undetected ChromeDriver to mimic an actual browser and act like a human.

After solving the challenges, the HTML code and cookies are returned to the client for use with other HTTP clients, such as Python Requests.

How to Use FlareSolverr

FlareSolverr's installation is unique, as you can go about it differently. We'll make it easy with step-by-step instructions.

But before that, let's try to access a website without FlareSolverr. For this example, we'll use NowSecure, a test website that displays a "You Passed" message if you're successful against its challenges.

To follow along, ensure you have Python installed, then install Requests using the following command:

pip install requests

Now, let's import the Requests module, define our target URL, and use the requests.get() method to make a GET request to NowSecure.

import requests

url = "https://nowsecure.nl/"

response = requests.get(url)

You can verify if it works by printing the response status code and its HTML:

print("status_code: " + str(response.status_code)) # prints the response status code
print(response.text) # prints the response content as text

This is the response you'll get:

status_code: 403

<body class="no-js">
    <div class="main-wrapper" role="main">
    <div class="main-content">
        <noscript>
            <div id="challenge-error-title">
                <div class="h2">
                    <span class="icon-wrapper">
                        <div class="heading-icon warning-icon"></div>
                    </span>
                    <span id="challenge-error-text">
                        Enable JavaScript and cookies to continue
                    </span>
                </div>
            </div>

The 403 error code means the request is unauthorized, and the HTML is that of Cloudflare's challenge page. In a nutshell, we've been detected and blocked.

Luckily, we have FlareSolverr! Let's see it next.

Install FlareSolverr

The most popular approach to setting up FlareSolverr is through a Docker container since the Chromium browser is included within the image. However, you can also configure Prowlarr and Jackett to that end.

For this tutorial, we'll use a Docker container.

First, install Docker by downloading it from one of the following links:

Run the installation package and follow the instructions. You may need to restart your computer after the installation is complete.

To check if Docker is installed correctly, enter the following command prompt in your terminal:

docker 

You should get something similar to this:

Docker Installation
Click to open the image in full screen

Otherwise, you'll get an error message if installed incorrectly.

Second, start the Docker engine by double-clicking on the Docker Desktop icon, and you'll be ready to integrate FlareSolverr. You may need to update or install Windows Subsystem for Linux (WSL) in Windows.

Next, download FlareSolverr from the Docker hub by running the following command in your terminal or Command Prompt:

docker pull flaresolverr/flaresolverr

If done correctly, you should see the FlareSolverr image in your Docker desktop's 'images' tab.

Docker Desktop
Click to open the image in full screen

Finally, create a new container for FlareSolverr to make it run as an isolated service on your system using the following command:

docker create \
--name=flaresolverr \
-p 8191:8191 \
-v /path/to/flaresolverr/config:/app/config \
flaresolverr/flaresolverr

The command above uses the flareSolverr/flareSolverr image to create a Docker container named "flareSolverr". It maps port 8191 in the container to the same port in your local machine to allow you access to services running inside the container from outside. Lastly, it mounts a volume from the host machine to the container using the '-v' option.

While the FlareSolverr Github repository doesn't explicitly mention the API endpoint URL, the default one is http://localhost:8191/v1 in most cases, as seen in the GitHub cURL example. Keep this URL in mind because we'll use it to make a request to FlareSolverr to enable it to handle Cloudflare challenges and grant us access to website content.

However, if the FlareSolverr container runs on a different host or port, the URL would differ from the default. You can inspect the container to see its host and port.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Run FlareSolverr

To run FlareSolverr, start your container using the following command, replacing [container_name] with the actual name of your container.

docker start [container_name]

To confirm you're running Flaresolverr correctly, visit http://localhost:8191/ on your web browser, and you should get a response similar to this:

{
	"msg": "FlareSolverr is ready!",
	"version": "3.1.2",
	"userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"
}

Scrape with FlareSolverr

If FlareSolverr runs correctly, you can easily send the URLs you want to scrape to its HTTP server to then expect the web content and cookies to be returned.

Therefore, to scrape with FlareSolverr, we need a tool that makes it easy to make HTTP requests. Since the Python Requests library is the de facto standard for making requests, we'll go with it.

To follow along, create a Python file, import Requests, define the FlareSolverr API URL, and specify the content type like this:

import requests

api_url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}

Next, define the payload to be sent in the request. In this case, it should contain the HTTP method, the URL we want to scrape, and the maximum timeout. We'll use NowSecure as a target URL once again, a Cloudflare-protected test website.

data = {
    "cmd": "request.get",
    "url": "https://nowsecure.nl/",
    "maxTimeout": 60000
}

Then, send a POST request to the FlareSolverr API, passing in the necessary parameters.

response = requests.post(api_url, headers=headers, json=data)

Lastly, verify it works:

print(response.content)

Putting it all together, you should have the following complete Python code.

import requests

api_url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}

data = {
    "cmd": "request.get",
    "url": "https://nowsecure.nl/",
    "maxTimeout": 60000
}

response = requests.post(api_url, headers=headers, json=data)

print(response.content)

Your response should contain values like these:

{
    "status": "ok", 
    "message": "Challenge solved!", 
    "solution": {"url": "https://nowsecure.nl/", 
    "status": 200, 
    "cookies": [{
        "domain": "nowsecure.nl", 
        "expiry": 1681830200, 
        "httpOnly": false, 
        "name": "cf_chl_rc_m", 
        "path": "/", 
        "sameSite": "Lax", 
        "secure": false, 
        "value": "1"}], 
    "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36", 
    "headers": {}, 
    "response": "<html lang=\"en\"><head>\n    <!-- Required meta tags -->\n    <meta charset=\"utf-8\">\n ......// ..... <h1>OH YEAH, you passed!</h1>\n    <p class=\"lead\">you passed!</p> .....//...",
}   
NowSecure Actual Response
Click to open the image in full screen

Well done!

Cookies with FlareSolverr

Remember, FlareSolverr also returns Cloudflare cookies after solving the challenge. We can retrieve and use them with our HTTP client (Python Requests), which is more efficient if you're making a lot of requests. You can retrieve cookies once for different requests rather than using the heavy Selenium and Undetected ChromeDriver each time.

To retrieve and use Cloudflare cookies with FlareSolverr and Requests, start by making a POST request to FlareSolverr as we did earlier. But also import JSON and define your target URL outside of the data' scope.

import requests
import json

url = "https://nowsecure.nl/"
api_url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}

data = {
    "cmd": "request.get",
    "url": url,
    "maxTimeout": 60000
}

response = requests.post(api_url, headers=headers, json=data)

Extract and clean the cookies, and then extract the User Agent used by FlareSolverr to access the target URL.

# retrieve the entire JSON response from FlareSolverr
response_data = json.loads(response.content)

# Extract the cookies from the FlareSolverr response
cookies = response_data["solution"]["cookies"]

# Clean the cookies
cookies = {cookie["name"]: cookie["value"] for cookie in cookies}

# Extract the user agent from the FlareSolverr response
user_agent = response_data["solution"]["userAgent"]

The above code retrieves the JSON response before extracting the cookies and User Agent. It also cleans the cookies by parsing them as a dictionary with only the cookie values.

Lastly, make a new GET request to the target URL using the cleaned cookies and FlareSolverr's User Agent.

response = requests.get(url, cookies=cookies, headers={"User-Agent": user_agent})

When you put everything together, your complete code should look like this:

import requests
import json
 
url = "https://nowsecure.nl/"
api_url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
 
data = {
    "cmd": "request.get",
    "url": url,
    "maxTimeout": 60000
}
 
response = requests.post(api_url, headers=headers, json=data)

# retrieve the entire JSON response from FlareSolverr
response_data = json.loads(response.content)
 
# Extract the cookies from the FlareSolverr response
cookies = response_data["solution"]["cookies"]
 
# Clean the cookies
cookies = {cookie["name"]: cookie["value"] for cookie in cookies}
 
# Extract the user agent from the FlareSolverr response
user_agent = response_data["solution"]["userAgent"]

response = requests.get(url, cookies=cookies, headers={"User-Agent": user_agent})

Verify it works by printing the result:

print(response.content)

You should have a result similar to this:

NowSecure result
Click to open the image in full screen

It worked!

Manage Sessions

Sessions are persistent connections with FlareSolverr that allow you to retain Cloudflare cookies until you're done with them. That way, you don't have to continuously solve challenges or send cookies to the browser with each request.

Check out the Session section for a detailed explanation of creating, listing, and destroying sessions.

Make POST Requests

If you're trying to solve a challenge requiring you to submit a form with POST data, you'll need to make a POST request. With FlareSolverr, that's similar to a GET request. You only need to replace request.get with request.post in the cmd section and include the postData parameter.

Here's an example:

import requests

api_url = 'http://localhost:8191/v1'
headers = {'Content-Type': 'application/json'}

data = {
  "cmd": "request.post",
  "url":"https://www.example.com/POST",
  "postData": POST_DATA,
  "maxTimeout": 60000
}

response = requests.post(api_url, headers=headers, json=data)

print(response.data)

Common Errors

When using FlareSolverr, users often encounter some frustrating errors. Let's see the most common ones and how to fix them!

The Cookies Provided by Flaresolverr Are Not Valid

This error occurs when the cookies returned by FlareSolverr don't work. That happens if cookies mismatch due to different IPs from Docker and FlareSolverr. In other words: when they're running on different networks.

That's often the case when using proxies or VPNs since FlareSolverr doesn't currently support them. To fix that, try disabling the proxy or VPN. If that's not possible, refer to this issue.

Challenge Detected but Flaresolverr Is Not Configured

This error is common with Jackett, where Cloudflare protects some indexers. To solve it, install the FlareSolverr service and configure the FlareSolverr API URL.

FlareSolverr Limitations and Solution

While FlareSolverr is a great tool for bypassing Cloudflare challenges, it's open-source. Such solutions rarely keep up with Cloudflare's frequently evolving bot management system, and here's an example:

Let's try our previous code against a website with more advanced Cloudflare protection, like Glassdoor.

Our script looks like this:

import requests

url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}

data = {
    "cmd": "request.get",
    "url": "https://www.glassdoor.com/Overview/Working-at-Google-EI_IE9079.11,17.htm",
    "maxTimeout": 60000
}

response = requests.post(url, headers=headers, json=data)

print(response.content)

And we get the following error message:

b'{"status": "error", "message": "Error: Error solving the challenge. Timeout after 60.0 seconds.", "startTimestamp": 1681908319571, "endTimestamp": 1681908380332, "version": "3.1.2"}'

The result above confirms FlareSolverr can't solve advanced Cloudflare challenges.

Fortunately, ZenRows, a constantly evolving web scraping solution, offers a way out. Let's see how it does against Glassdoor, where FlareSolverr failed.

To use ZenRows, sign up to get your free API key. Then, install the ZenRows SDK using the following command:

pip install zenrows

Next, import the ZenRows client and create a new instance using your API key.

from zenrows import ZenRowsClient

#Create a new zenrowsClient instance 
client = ZenRowsClient("APIKey")

Specify your target URL and set the necessary parameters. To bypass anti-bot measures, you must set "antibot": "true","premium_proxy": "true." and "js_render": "true".

url = "https://www.glassdoor.com/Overview/Working-at-Google-EI_IE9079.11,17.htm"
params = {"js_render":"true","antibot":"true","premium_proxy":"true"}

Lastly, make a GET request using the predefined parameters.

response = client.get(url, params=params)

Your complete code should look like this, to which we added a print:

from zenrows import ZenRowsClient
 
#create new zenrowsclient instance
client = ZenRowsClient("Your_API_Key")
 
url = "https://www.glassdoor.com/Overview/Working-at-Google-EI_IE9079.11,17.htm"
#define the necessary parameters
params = {"js_render":"true","antibot":"true","premium_proxy":"true"}
 
#make a get request 
response = client.get(url, params=params)

print(response.text)

Run the code, and you'll get the following result:

Glassdoor Success
Click to open the image in full screen

It's great to bypass any level of Cloudflare protection, right?

Conclusion

FlareSolverr is a great tool for solving Cloudflare challenges. However, the bot detection system frequently updates, while FlareSolverr still needs to. That's why it'll only work on some websites with less advanced protection.

In these cases, consider using always-evolving solutions like ZenRows, which succeeds where FlareSolverr failed. Are you starting a new project? Sign up to get your free API key today.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.