CAPTCHAs are one of the biggest challenges faced when scraping websites with Python Requests. These frustrating pop-ups can easily halt your scraping progress.
Luckily, there are several proven ways to bypass Captcha, but for this tutorial, we'll focus on the following three methods:
- Method #1: Bypass CAPTCHA with Python Requests and 2Captcha.
- Method #2: Bypass CAPTCHA with a web scraping API.
- Method #3: Rotate User Agents.
Let's dive in and explore each method in detail.
Method #1: Bypass CAPTCHA with Python Requests and 2Captcha
Many websites use CAPTCHA to protect their content from bots and unauthorized access. These tests are designed to ensure that only human visitors can proceed, making them a big obstacle for web scrapers.
One common way to solve CAPTCHAs is by using third-party services like 2Captcha. These services often rely on human solvers or advanced algorithms to decode CAPTCHA challenges and return a solution. However, this process can take a while, which may slow down your scraping efforts.
Let's put it to the test with this CAPTCHA challenge.Â
Start by installing the necessary dependencies:
pip3 install requests beautifulsoup4 twocaptcha-python
Requests make HTTP requests, 2Captcha is used to solve the CAPTCHA challenge, BeautifulSoup parses the HTML to extract useful information, and urljoin handles relative URLs.
Next, initialize the 2Captcha solver using your API key and set up the URL of the CAPTCHA challenge page.
import requests
from twocaptcha import TwoCaptcha
from bs4 import BeautifulSoup
from urllib.parse import urljoin
# your 2Captcha api key
api_key = 'YOUR_2CAPTCHA_API_KEY'
# initialize the 2Captcha solver
solver = TwoCaptcha(api_key)
# url of the captcha challenge page
url = "https://2captcha.com/demo/normal"
# start a session to maintain cookies
session = requests.Session()
# send a request to the captcha page to download the image
response = session.get(url)
The next step is to download the CAPTCHA image from the target page, which we'll send to 2Captcha for solving.
# ...
# parse the html to extract the captcha image url
soup = BeautifulSoup(response.content, 'html.parser')
# locate the captcha image using the 'alt' attribute
captcha_img_tag = soup.find("img", {"alt": "normal captcha example"})
captcha_img_url = captcha_img_tag['src']
# handle relative urls by joining with the base url
captcha_img_url = urljoin(url, captcha_img_url)
# download the captcha image
captcha_img_response = session.get(captcha_img_url)
# save the captcha image locally (necessary for 2Captcha api)
captcha_image_path = "captcha_image.jpg"
with open(captcha_image_path, "wb") as f:
f.write(captcha_img_response.content)
After downloading the image, send it to 2Captcha for solving and print the result.
# ...
# send the captcha image to 2Captcha for solving
try:
result = solver.normal(captcha_image_path)
print(f"captcha solved: {result['code']}")
except Exception as e:
print(f"error solving captcha: {e}")
exit()
Here's the complete code:
import requests
from twocaptcha import TwoCaptcha
from bs4 import BeautifulSoup
from urllib.parse import urljoin
# your 2Captcha api key
api_key = 'YOUR_2CAPTCHA_API_KEY'
# initialize the 2Captcha solver
solver = TwoCaptcha(api_key)
# url of the captcha challenge page
url = "https://2captcha.com/demo/normal"
# start a session to maintain cookies
session = requests.Session()
# send a request to the captcha page to download the image
response = session.get(url)
# parse the html to extract the captcha image url
soup = BeautifulSoup(response.content, 'html.parser')
# locate the captcha image using the 'alt' attribute
captcha_img_tag = soup.find("img", {"alt": "normal captcha example"})
captcha_img_url = captcha_img_tag['src']
# handle relative urls by joining with the base url
captcha_img_url = urljoin(url, captcha_img_url)
# download the captcha image
captcha_img_response = session.get(captcha_img_url)
# save the captcha image locally (necessary for 2Captcha api)
captcha_image_path = "captcha_image.jpg"
with open(captcha_image_path, "wb") as f:
f.write(captcha_img_response.content)
# send the captcha image to 2Captcha for solving
try:
result = solver.normal(captcha_image_path)
print(f"captcha solved: {result['code']}")
except Exception as e:
print(f"error solving captcha: {e}")
exit()
If the CAPTCHA is successfully solved, you will see an output similar to:
CAPTCHA solved: W9H5K
Congratulations! You've successfully bypassed CAPTCHA with Python Requests and 2Captcha. While 2Captcha is a practical tool for small-scale data extraction and testing purposes, it may not be the most economical option for large-scale scraping projects. Additionally, it does not solve all CAPTCHA types.
Let's explore another alternative.
Method #2: Bypass CAPTCHA With a Web Scraping API
The best approach to bypassing CAPTCHAs is to avoid them. By mimicking natural user behavior and not getting blocked, you can often move through sites without setting off anti-bot systems and CAPTCHA challenges.
Using web scraping APIs like ZenRows is a reliable way to bypass any CAPTCHA, no matter how complex the anti-bot measures are.
With features like auto-rotating premium proxies, user agent rotation, geolocation, and more, ZenRows provides everything you need to scrape without getting blocked.
Let's see ZenRows in action against this Anti-bot challenge page.
Sign up for free, and you'll be redirected to the Request Builder page.
Enter the target URL, activate Premium Proxies and enable the JS Rendering boost mode. Choose Python and click on the API tab to get the generated code.

Copy the request code generated on the right. The code uses Requests, so install the library using the following command:
pip3 install requests
The Request Builder will generate Python code similar to this:
# pip3 install requests
import requests
url = 'https://www.scrapingcourse.com/antibot-challenge'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
'url': url,
'apikey': apikey,
'js_render': 'true',
'premium_proxy': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)
Run it, and you'll get the HTML content of your target web page:
<html lang="en">
<head>
<!-- ... -->
<title>Antibot Challenge - ScrapingCourse.com</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
<h2>
You bypassed the Antibot challenge! :D
</h2>
<!-- other content omitted for brevity -->
</body>
</html>
Congratulations, you've successfully scraped a CAPTCHA-protected page with ZenRows!Â
Method #3: Rotate User Agents
CAPTCHAs can often be triggered when websites detect repetitive behavior from the same User Agent. A User Agent is a string sent with each HTTP request that identifies the browser or client and operating system being used. Bots are frequently flagged when they repeatedly use the same User Agent, which makes it clear they're not genuine users.
By switching the User Agent for each request, you appear like a real user and reduce the chance of being blocked.
Let's see how to rotate user agents using Python Requests. Start by installing the Request library in your terminal:
pip3 install requests
Then, import the required libraries:
import requests
import itertools
Next, create a list of common User Agents. We compiled a list of the best User Agents you can use while scraping:
# ...
# create a User Agent list
user_agent_list = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0",
# ...
]
Define a function to rotate user agents using the itertools.cycle
method:
# ...
# define a User Agent rotator
def rotate_ua(user_agent_list):
return itertools.cycle(user_agent_list)
Then, create a generator instance to keep rotating through the user agents:
# ...
# create a generator instance
user_agent_generator = rotate_ua(user_agent_list)
Finally, let's use this generator to send HTTP requests while rotating the user agent for each one:
# ...
# rotate the User Agent for 4 requests
for request in range(4):
# send a request to httpbin.io
response = requests.get(
"https://httpbin.io/user-agent",
headers={"User-Agent": next(user_agent_generator)},
)
# print the response text to see the current User Agent
print(response.text)
Here's the full code:
# import the required libraries
import requests
import itertools
# create a User Agent list
user_agent_list = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0",
# ...
]
# define a User Agent rotator
def rotate_ua(user_agent_list):
return itertools.cycle(user_agent_list)
# create a generator instance
user_agent_generator = rotate_ua(user_agent_list)
# rotate the User Agent for 3 requests
for request in range(4):
# send a request to httpbin.io
response = requests.get(
"https://httpbin.io/user-agent",
headers={"User-Agent": next(user_agent_generator)},
)
# print the response text to see the current User Agent
print(response.text)
The output will show different user agents for each request:
# request 1
{
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
}
# request 2
{
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
}
# request 3
{
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0"
}
By rotating User Agents, you can make your scraper's behavior resemble a real user's, significantly lowering the chances of being blocked by CAPTCHA. However, relying solely on this may not always be sufficient. Here are a few limitations of this method:
- Detection Algorithms: Websites employ sophisticated algorithms that analyze patterns in user behavior. If requests are made too frequently or exhibit unnatural patterns, even varied user agents can trigger CAPTCHAs.
- IP Address Tracking: Many websites monitor IP addresses for suspicious activity. If numerous requests come from the same IP, it may still be flagged, regardless of the user agent being used.
To make rotating User Agents more effective, you can combine it with additional measures like IP rotation, introducing random delays between requests, maintaining session consistency, and regularly updating your list of user agents.Â
These combined strategies help mimic genuine user behavior and significantly reduce the risk of triggering CAPTCHAs.
Conclusion
We explored various methods for bypassing CAPTCHA using Python requests. While 2Captcha works well for small-scale scraping, it can become costly and impractical at larger scales. Rotating user agents can help make your bot appear more human, but it may not be sufficient for more sophisticated anti-bot systems.
To ensure successful data extraction, you need a web scraping API like ZenRows to bypass all CAPTCHAs and scrape any website without getting blocked. Sign up now to try ZenRows for free.