How to Scrape cf_clearance Cookies from Cloudflare-Protected Websites

Idowu Omisola
Idowu Omisola
May 25, 2025 · 6 min read

The cf_clearance cookie isn't just any cookie, it's the access key to Cloudflare-protected websites. You only need to obtain that cookie and use it in your scraping request to bypass Cloudflare. But how can you go about it?

We'll show you how to extract the cf_clearance cookie from a Cloudflare-protected site with the CF-Clearance-Scraper. You'll then learn how to use this cookie to bypass Cloudflare in your scraper.

Understanding Cloudflare Protection and cf_clearance Cookies

While Cloudflare usually blocks bots with a CAPTCHA box, it analyzes several other request parameters under the hood to detect bots. These include a client's fingerprints, JavaScript solving ability, behavioral patterns, IP address, network traffic logs, and many other data points that can tell whether or not a request is bot-like.

The rule is that a request must pass these challenges before it can access the site behind the Cloudflare firewall. Once passed, it gets a clearance cookie called the cf_clearanc, which serves as a pass key to the target site. The request must provide this cf_clearance cookie within the same browsing session. Otherwise, it gets blocked.

Since cf_clearance is bound to a session, a client must maintain the same User Agent and IP address used to obtain the cookie throughout the session. If you use an IP address or a User Agent different from the one used to obtain the cookie, Cloudflare will block your scraper instantly.

Standard HTTP clients like Python's Requests library get blocked because they can't solve Cloudflare's initial challenges. And that means they won't get the cf_clearance cookie required to access the target site during scraping.

However, you can manually obtain the cf_clearance cookie and pass it to a request within a session to bypass Cloudflare. Let's see how that works in the next section.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

How to Scrape and Use cf_clearance Cookies

You'll learn how to scrape the cf_clearance cookie with CF-Clearance-Scraper, a command-line tool for retrieving the cf_clearance from a Cloudflare-protected website.

You'll then use the scraped cookie to bypass Cloudflare protection with the Requests library.

Let's start with the installation step.

Step 1: Requirements and Installation

The CF-Clearance-Scraper package only supports Python 3.10+. It also runs a headless Chrome instance under the hood. So, ensure you install the latest Python and Chrome versions on your machine if you've not done so already.

To install CF-Clearance-Scraper, clone its repository with git:

Terminal
git clone https://github.com/Xewdy444/CF-Clearance-Scraper

Migrate into the CF-Clearance-Scraper folder and install the requirements using pip:

Terminal
cd CF-Clearance-Scraper
pip3 install requirements.txt

Let's now see how the tool works.

Step 2: Understanding CF-Clearance-Scraper Parameters

Running CF-Clearance-Scraper requires executing the main.py file with a compulsory URL parameter and other optional ones.

We've explained the main parameters below:

Parameter Role
-f The output JSON file name to write the scraped cookies
-t It sets the request timeout (in seconds) for retrieving the cookies
-p Specifies the proxy URL
-ua The User Agent header used in the cf_clearance scraping request
--disable-http2 Disables HTTP/2 protocol
--disable-http3 Disables HTTP/3 protocol
-ac Save all cookies in addition to cf_clearance
URL The URL of the Cloudflare-protected site

CF-Clearance-Scraper works better with a User Agent and a proxy. So, in this tutorial, we'll use it with a Chrome User Agent and a proxy.

Here's what the CF-Clearance-Scraper command structure looks like:

Example
python main.py -p <PROXY_ADDRESS> -t <TIMEOUT_IN SECONDS> -ua "<USER_AGENT_STRING>" -f <COOKIE_FILE.json> <TARGET_URL>

Let's see this command in action in the next step.

You'll scrape the cf_clearance cookie from the Cloudflare Challenge page using the User Agent, proxy, and timeout (60 seconds) parameters. You'll then write the proxies to a cookies.json. To achieve this, run the main.py file with these parameters as shown:

Terminal
python main.py -p http://190.58.248.86:80 -t 60 -ua "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36" -f cookies.json https://www.scrapingcourse.com/cloudflare-challenge

The above command returns the cf_clearance cookie and writes all scraped cookies to a cookies.json file within the CF-Clearance-Scraper project folder.

Output
# ... 
[12:40:42] [INFO] Cookie: cf_clearance=KkssR4xQ9xEJwlNtUXQEKkoQl...lgI5

Since you'll use this cookie in a scraping request, running the above command via a Python subprocess is best. You can then retrieve the cf_clearance string from the terminal logs using regex.

Create and open a new scraper.py file inside the CF-Clearance-Scraper folder. Import the subprocess and re libraries and create a cf_clearance_scraper function. This function accepts the URL, proxy, and User Agent parameters. Input the commands into an array with the required parameters. Run the command with subprocess and extract the cf_clearance value from the logs using regex:

scraper.py
import subprocess
import re

def cf_clearance_scraper(url, proxy, user_agent):
    # specify the command and pass the parameters
    command = [
        "python",
        "main.py",
        "-p",
        proxy,
        "-t",
        "60",
        "-ua",
        user_agent,
        "-f",
        "cookies.json",
        url,
    ]

    try:
        # run the command and capture output
        process = subprocess.run(
            command,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
        )

        output = process.stdout

        # extract the cf_clearance value from the logs with regex
        match = re.search(r"cf_clearance=([^\s]+)", output)
        if match:
            cf_clearance = match.group(1)
            return cf_clearance
        else:
            return None

    except Exception as e:
        print(f"Error running the command: {e}")
        return None

Specify the target URL, User Agent, and proxy address. Then pass these values to the cf_clearance_scraper function to execute it:

scraper.py
# ...

# define the target URL, User Agent, and proxy address
target_url = "https://www.scrapingcourse.com/cloudflare-challenge"
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
proxy = "http://190.58.248.86:80"

# get the cf_clearance cookie
cf_clearance = cf_clearance_scraper(target_url, proxy, user_agent)
print(cf_clearance)

Combine both snippets at this point, and you'll get the following full code:

scraper.py
import subprocess
import re

def cf_clearance_scraper(url, proxy, user_agent):
    # specify the command and pass the parameters
    command = [
        "python",
        "main.py",
        "-p",
        proxy,
        "-t",
        "60",
        "-ua",
        user_agent,
        "-f",
        "cookies.json",
        url,
    ]

    try:
        # run the command and capture output
        process = subprocess.run(
            command,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
        )

        output = process.stdout

        # extract the cf_clearance value from the logs with regex
        match = re.search(r"cf_clearance=([^\s]+)", output)
        if match:
            cf_clearance = match.group(1)
            return cf_clearance
        else:
            return None

    except Exception as e:
        print(f"Error running the command: {e}")
        return None

# define the target URL, User Agent, and proxy address
target_url = "https://www.scrapingcourse.com/cloudflare-challenge"
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
proxy = "http://190.58.248.86:80"

# get the cf_clearance cookie
cf_clearance = cf_clearance_scraper(target_url, proxy, user_agent)
print(cf_clearance)

The above code writes cookies to a cookies.json file and outputs the cf_clearance cookie:

Output
MKybX880PCu.GfWLhonkBnG64WBs4ASAXeZ...Tux0eDI

Let's use that cf_clearance cookie to bypass Cloudflare.

Step 4: Using the cf_clearance Cookie in Your Scraper

Now that your scraper extracts the cf_clearance cookie, you'll use it to bypass Cloudflare by persisting it throughout a single scraping session. Let's extend the previous code to achieve this.

Add the Requests library to your imports. You don't need to install the Requests library separately because it's been installed with the CF-Clearance-Scraper requirements.

Create a scraper function that accepts the URL, cf_clearance, proxy, and User Agent parameters. Instantiate a request session and set it up with the cf_clearance cookie, User Agent, and proxy parameters. Then, request the target URL with the session object:

scraper.py
# ...
import requests

# ...
def scraper(url, cf_clearance, proxy, user_agent):
    # create a persistent session
    session = requests.Session()

    # set the cf_clearance cookie in the session
    session.cookies.set("cf_clearance", cf_clearance)

    # set the same User-Agent header
    session.headers.update({"User-Agent": user_agent})

    # set the proxy for the session
    session.proxies.update(
        {
            "http": proxy,
            "https": proxy,
        }
    )

    # request the target URL
    try:
        response = session.get(url)
        response.raise_for_status()
        return response.text
    except requests.exceptions.RequestException as e:
        return f"Error making request: {e}"

Next, execute the scraper function only if cf_clearance exists. Add this below the cf_clearance_scraper function execution:

scraper.py
# ...
if cf_clearance:
    # use the cf_clearance cookie in a request
    response = scraper(target_url, cf_clearance, proxy, user_agent)
    print(response)
else:
    print("Failed to retrieve cf_clearance cookie. Exiting.")

Combine the snippets in this section with the previous ones. Here's the complete code:

scraper.py
import subprocess
import re
import requests

def cf_clearance_scraper(url, proxy, user_agent):
    command = [
        "python",
        "main.py",
        "-p",
        proxy,
        "-t",
        "60",
        "-ua",
        user_agent,
        "-f",
        "cookies.json",
        url,
    ]

    try:
        # run the command and capture output
        process = subprocess.run(
            command,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
        )

        output = process.stdout

        # regex to extract the cf_clearance value
        match = re.search(r"cf_clearance=([^\s]+)", output)
        if match:
            cf_clearance = match.group(1)
            return cf_clearance
        else:
            return None

    except Exception as e:
        print(f"Error running the command: {e}")
        return None




def scraper(url, cf_clearance, proxy, user_agent):
    # create a persistent session
    session = requests.Session()

    # set the cf_clearance cookie in the session
    session.cookies.set("cf_clearance", cf_clearance)

    # set the same User-Agent header
    session.headers.update({"User-Agent": user_agent})

    # set the proxy for the session
    session.proxies.update(
        {
            "http": proxy,
            "https": proxy,
        }
    )

    # request the target URL
    try:
        response = session.get(url)
        response.raise_for_status()
        return response.text
    except requests.exceptions.RequestException as e:
        return f"Error making request: {e}"


# define the target URL, User Agent, and proxy address
target_url = "https://www.scrapingcourse.com/cloudflare-challenge"
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
proxy = "http://190.58.248.86:80"

# get the cf_clearance cookie
cf_clearance = cf_clearance_scraper(target_url, proxy, user_agent)

if cf_clearance:
    # use the cf_clearance cookie in a request
    response = scraper(target_url, cf_clearance, proxy, user_agent)
    print(response)
else:
    print("Failed to retrieve cf_clearance cookie. Exiting.")

The above scraper may occasionally bypass Cloudflare. You might need to run it several times to get a successful result.

A successful execution outputs the protected site's full-page HTML:

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Cloudflare Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Cloudflare challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

If using a rotating proxy, opt for a service that supports sticky sessions, like the ZenRows Residential Proxies. This feature allows you to pause proxy rotation and maintain a single IP for a specified period. Set the sticky period within a reasonable limit to persist the IP throughout the request session.

For instance, with a ZenRows sticky session (ttl) of 1 minute, your proxy URL becomes:

Example
proxy = "http://<ZENROWS_PROXY_USERNAME>:<ZENROWS_PROXY_PASSWORD>_ttl-1m_session-YPMlnbAybl5q@superproxy.zenrows.com:1337"

For more information, check ZenRows' sticky session documentation.

While CF-Clearance-Scraper can help you bypass Cloudflare, its low success rate means it's unreliable. Let's see a solution that works all the time.

Avoid Getting Blocked

Despite being a useful tool for bypassing Clodflare,CF-Clearance-Scraper has some limitations that make it unreliable. First, the cf_clearance cookie is IP-based, so the cookie becomes invalid if the IP used to solve the challenge changes.

The fact that the tool relies on a Chrome instance under the hood makes it unsuitable for large-scale scraping because it takes up a lot of memory.

Another limitation of CF-Clearance-Scraper is that the cf_clearance cookie can expire mid-scraping. This results in broken executions, incomplete data, or compromised data pipelines. Besides, Cloudflare consistently updates its security measures to clamp down on open-source solutions like CF-Clearance-Scraper. This can render the tool obsolete in the long run.

The best way to scrape any website reliably without getting blocked is to use a web scraping solution like the Zenrows Universal Scraper API. ZenRows has a 99.93% success rate, ensuring you get all the data you need at scale without limitations.

With ZenRows, you no longer need to manage the cf_clearance cookie or worry about expiring sessions. It consistently adapts to evolving Cloudflare security measures with zero effort from you. That way, you can focus on other business logic rather than wasting time and resources fixing missing or broken data pipelines.

Let's see how ZenRows' Universal Scraper API works by scraping the previous Cloudflare Challenge page.

Sign up and go to the Request Builder. Then, paste the target URL in the link box and activate Premium Proxies and JS Rendering.

building a scraper with zenrows
Click to open the image in full screen

Select Python as your programming language and choose the API connection mode. Copy the generated code and paste it into your scraper.

The generated Python code should look like this:

scraper.py
# pip3 install requests
import requests

url = "https://www.scrapingcourse.com/cloudflare-challenge"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
    "url": url,
    "apikey": apikey,
    "js_render": "true",
    "premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)

The above code bypasses the anti-bot challenge and outputs the protected site's full-page HTML, as shown:

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Cloudflare Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Cloudflare challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Congratulations! 🎉 Your scraper now bypasses Cloudflare consistently. No more intermittent failures and missing data.

Conclusion

You've learned to extract and use the cf_clearance cookie to bypass Cloudflare during scraping. You used the CF-Clearance-Scraper to scrape the cookie from a Cloudflare-protected site and deployed it as an access key on the same site.

That said, CF-Clearance-Scraper still fails on several occasions due to some limitations. To avoid unpredictable scraping outcomes and extract data confidently, we recommend using ZenRows, an all-in-one scraping solution for all enterprise levels.

Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you