Web Crawling Webinar for Tech Teams
Web Crawling Webinar for Tech Teams

How to Bypass CAPTCHA with Selenium

Updated: October 14, 2023 · 6 min read

A third of the top 100,000 websites use CAPTCHAs to stop bot traffic, so they've probably complicated your scraping plans at some time.

The challenges are designed as a gate to let humans in and keep bots out, but does that mean you can't win over them? Luckily, you can! Today, you'll learn how to bypass CAPTCHA with Selenium in Python using three different methods:

Cool? Let's dive in. 🚀

Can Selenium Bypass CAPTCHA?

CAPTCHAs usually appear when the visitor exhibits suspiciously bot-like behavior, such as visiting many pages without scrolling, but Selenium can help here.

There are two main approaches to get rid of the tests: you can solve them by paying humans (expensive at scale, yet sometimes useful), or prevent them from appearing by implementing advanced techniques (and retrying your request if any fail). Selenium allows you to interact with the form in the first case and make your traffic look more human in the second.

Method #1: Bypass CAPTCHA with Selenium and 2Captcha

Let's imagine you need to resolve a CAPTCHA, for example, to submit a form. We'll do that with a popular service called 2captcha to handle CAPTCHA in Selenium using a demo page.

We'll explore how the service works against a popular CAPTCHA challenge. So, let's get started by installing some dependencies. If you don't have them yet, run the command pip install selenium 2captcha-python and import some modules as shown below.

Note: In the past, installing WebDriver was a mandatory step, but this is no longer the case. Selenium version 4 and higher come with WebDriver built-in by default. If you have an older Selenium version, upgrading is recommended to access the latest features and capabilities. You can check your current version with the command pip show selenium and update to the latest version using pip install --upgrade selenium.

program.py
from selenium.webdriver.common.by import By
from twocaptcha import TwoCaptcha
from selenium import webdriver
import time

Then, open a Chrome instance and navigate to the demo page.

program.py
driver = webdriver.Chrome()
url = "https://2captcha.com/demo/normal"
driver.get(url)

The next step is to locate the CAPTCHA image and pass its URL to the solver.normal() method, which returns the text solution. Save it in the result variable.

program.py
imgResults = driver.find_elements(By.XPATH,"//img[contains(@class,'_2hXzbgz7SSP0DXCyvKWcha')]") 
solver = TwoCaptcha(Your_2Captcha_API_key)
result=solver.normal(imgResults[0].get_attribute('src'))
print ('solved: ' + str(result))

The following task is to find the input field, fill it with the solution text received by the 2Captcha service, and click the submit button.

program.py
captchafield = driver.find_element(By.XPATH,"//input[contains(@class,'_26Pq0m_qFk19UXx1w0U5Kv')]")
captchafield.send_keys(result['code'])

button = driver.find_element(By.XPATH,"//button[contains(@class, 'l2z7-tVRGe-3sq5kU4uu5 _2xjDiWmBxfqem8nGQMmGci _2HIb5VBFp6Oi5_JoLdEcl6 _2vbG_IBm-DpI5KeEAHJkRy')]")
button.click()
time.sleep(10)

To end, locate the <p> element on the page and print its message. You should get ''Captcha is passed successfully!'' in case of success.

program.py
messagefield=driver.find_element(By.XPATH,"//p[contains(@class,'_2WOJoV7Dg493S8DW_GobSK')]")
print (messagefield.text)

Here's the complete code:

program.py
from selenium.webdriver.common.by import By
from twocaptcha import TwoCaptcha
from selenium import webdriver
import time

driver = webdriver.Chrome()
url = "https://2captcha.com/demo/normal"
driver.get(url)

imgResults = driver.find_elements(By.XPATH,"//img[contains(@class,'_2hXzbgz7SSP0DXCyvKWcha')]") 
solver = TwoCaptcha(Your_2Captcha_API_key)
result = solver.normal(imgResults[0].get_attribute("src"))
print ("solved: " + str(result))

captchafield = driver.find_element(By.XPATH,"//input[contains(@class,'_26Pq0m_qFk19UXx1w0U5Kv')]")
captchafield.send_keys(result["code"])

button = driver.find_element(By.XPATH,"//button[contains(@class, 'l2z7-tVRGe-3sq5kU4uu5 _2xjDiWmBxfqem8nGQMmGci _2HIb5VBFp6Oi5_JoLdEcl6 _2vbG_IBm-DpI5KeEAHJkRy')]")
button.click()
time.sleep(10)

messagefield=driver.find_element(By.XPATH,"//p[contains(@class,'_2WOJoV7Dg493S8DW_GobSK')]")
print (messagefield.text)

And here's the output:

Output
solved: {'captchaId': '72848141048', 'code': 'W9H5K'}
Captcha is passed successfully!

Congratulations! You solved your first CAPTCHA with Selenium and 2Captcha.

However, using paid solvers is hard to scale up since it's expensive and slow, and they only work with a fraction of all CAPTCHA types. Therefore, let's see an alternative next.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Method #2: Implement Selenium Stealth

The use of Selenium is easy to be identified since its base version sends some clear bot signals, such as its User-Agent name, which might easily prompt a CAPTCHA. And to prove this, let's try to access OpenSea, a protected site.

For that, create a headless Chrome instance, pass the target URL to the get() function to wait for the page to load, and take a screenshot.

program.py
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome(options=options)

driver.get("https://opensea.io/")
time.sleep(30)
driver.save_screenshot("opensea.png")
driver.close()

What's the result?

Selenium Output
Click to open the image in full screen

Access denied! OpenSea detected non-human traffic and blocked our bot. This is where the selenium-stealth, a Python package that helps avoid detection when scraping with Selenium, comes to the rescue. It will make your traffic look more manual and prevent getting blocked, i.e. with CAPTCHAs.

To start with, install the Stealth package: pip install selenium-stealth


Then, import some necessary libraries and configure the driver to run Chrome in headless mode. Next, configure, exclude the enable-automation, and disable the useAutomationExtension since websites detect bot traffic with those flags.

program.py
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
from selenium_stealth import stealth
 
options = Options()
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)

The following step creates a ChromeDriver instance with custom options:

program.py
driver = webdriver.Chrome(options=options)

Now, let's use the stealth() function to set custom Chrome webdriver configurations to make our bot less detectable. For example, we'll set a user agent string, which is sent along with the HTTP request.

program.py
stealth(driver,
       user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.5481.105 Safari/537.36',
       languages=["en-US", "en"],
       vendor="Google Inc.",
       platform="Win32",
       webgl_vendor="Intel Inc.",
       renderer="Intel Iris OpenGL Engine",
       fix_hairline=True,
       )

The script then navigates to the target page, waits for it to load, and screenshots it.

program.py
driver.get("https://opensea.io/")
time.sleep(30)
driver.save_screenshot("opensea.png")
driver.close()

Here's the full code:

program.py
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
from selenium_stealth import stealth
 
options = Options()
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)

driver = webdriver.Chrome(options=options)

stealth(driver,
       user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.5481.105 Safari/537.36',
       languages=["en-US", "en"],
       vendor="Google Inc.",
       platform="Win32",
       webgl_vendor="Intel Inc.",
       renderer="Intel Iris OpenGL Engine",
       fix_hairline=True,
       )

driver.get("https://opensea.io/")
time.sleep(30)
driver.save_screenshot("opensea2.png")
driver.close()

Let's look at the output:

OpenSea Output
Click to open the image in full screen

Success! This helps bypass CAPTCHA with Selenium. You might also want to look at our guide on bypassing blocks with Selenium's Undetected ChromeDriver.

However, would Selenium Stealth work scraping sites with the toughest anti-bot protections? Try with a product review page on G2, and you'll get the answer.

Stealth Output G2
Click to open the image in full screen

As the two methods turned out to be enough to bypass CAPTCHA with Selenium, we should explore a better option.

Method #3: Bypass CAPTCHA with ZenRows API

ZenRows' Universal Scraper API offers the most reliable way to handle all CAPTCHAs on any website. It automatically manages premium proxies, browser emulation, header consistency, and request patterns to ensure your scraping succeeds.

Let's see how ZenRows performs against a protected page like the Antibot Challenge page.

First, create your ZenRows account and access the Request Builder.

building a scraper with zenrows
Click to open the image in full screen

Type in your target website, turn on JS Rendering and activate Premium Proxies.

Next, choose Python and click on the API connection mode. Then, copy the generated code and paste it into your script.

scraper.py
# pip3 install requests
import requests

url = "https://www.scrapingcourse.com/antibot-challenge"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
    "url": url,
    "apikey": apikey,
    "js_render": "true",
    "premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params, print(response.text)

The generated code uses Python's Requests library as the HTTP client. You can install this library using pip:

Terminal
pip3 install requests

Run the code, and you'll successfully access the page:

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Antibot Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Antibot challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Congratulations! 🎉 You’ve successfully bypassed the anti-bot challenge page using ZenRows. This works for any website.

Conclusion

In this guide, you've learned the essentials of handling CAPTCHAs in web scraping:

  • How to bypass CAPTCHA with Selenium and 2Captcha.
  • How to implement Selenium Stealth.
  • How to avoid CAPTCHAs on any website.

We've covered three different ways to handle CAPTCHAs while scraping with Selenium. Basic Selenium alone triggers CAPTCHAs constantly, and while 2Captcha helps solve them, it's expensive and slow at scale. Selenium Stealth improves things by making your bot look more human, but it still struggles with sophisticated anti-bot systems. ZenRows proves to be the most reliable solution to effectively bypass any CAPTCHA. Try ZenRows for free.

Ready to get started?

Up to 1,000 URLs for free are waiting for you