How to Bypass CAPTCHA With Selenium in Ruby

July 9, 2024 · 7 min read

Does your Ruby Selenium web scraper get blocked by CAPTCHA? No worries: You're about to learn how to solve this problem.

In this article, you'll learn the two methods to bypass CAPTCHA while scraping with Selenium in Ruby:

Can Selenium Ruby Bypass CAPTCHA?

The short answer is yes, but you need to give your Ruby Selenium web scraper a boost.

There are two ways to handle CAPTCHAs when scraping with Ruby:

  • Solve the CAPTCHA after it appears.
  • Bypass the CAPTCHA so it's not triggered. 

The most effective option is to bypass the CAPTCHA and prevent it from appearing. CAPTCHAs that have already been displayed are harder to overcome since a human must solve them.

In this tutorial, we'll focus on CAPTCHA bypass and show you two methods of doing it: a free one and a paid but foolproof one. Let's go!

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Method #1: Use Undetected ChromeDriver With Selenium and Ruby

The Undetected ChromeDriver is an optimized version of the Selenium ChromeDriver designed to avoid anti-bots. Although it's dedicated to Python, you can use it in Ruby by porting its executable file to the Selenium service package. To do that, you need a bit of Python knowledge.

The idea is to create an executable file of the Undetected ChromeDriver with Python and use it to run the Selenium ChromeDriver in Ruby. 

Let's bypass a simple Turnstile CAPTCHA on nowsecure.nl to see how this method works.

Here's what the target website looks like:

Nowsecure Turnstile Demo
Click to open the image in full screen

To begin, create an Undetected ChromeDriver executable file with Python. Ensure you've installed the Python library using pip:

Terminal
pip install undetected-chromedriver

Create a Python file in your code editor and input the following code:

scraper.py
# import the required modules
import undetected_chromedriver as uc
from multiprocessing import freeze_support

if __name__ == '__main__':

    # call freeze support to ensure the creation of an executable
    freeze_support()

    # create a ChromeDriver instance
    driver = uc.Chrome(headless=False, use_subprocess=False)

    # quit the driver
    driver.quit()

Run that code with this command:

Terminal
python scraper.py

The command will create a new Undetected ChromeDriver executable file in the following directory (for Windows). If you can't find the AppData folder, it may be hidden on your computer. Ensure you enable the option to show files and folders in C:\Users\<YOUR_USERNAME>.

Example
C:/Users/<YOUR_USERNAME>/AppData/Roaming/undetected_chromedriver/undetected_chromedriver.exe

The path might be different on your machine but should default to ...AppData/Roaming/. An equivalent of that directory on Linux should be:

Example
~/.local/share/undetected_chromedriver/undetected_chromedriver

Now, let's bypass the CAPTCHA with the Undetected ChromeDriver in Ruby. 

Locate your Chrome browser path, as you'll also use that in your scraper. It should default to the following directory on Windows:

Example
C:/Program Files/Google/Chrome/Application/chrome.exe

The default Chrome path on Linux should be:

Example
~/.config/google-chrome

Import Selenium WebDriver. Then, specify the paths to your Chrome browser and the Undetected ChromDriver executable paths.

scraper.rb
# import the required Gem
require 'selenium-webdriver'

# set the path to your actual Chrome browser executable file
chrome_exe_path = 'C:/Program Files/Google/Chrome/Application/chrome.exe'

# set the path to the undetected_chromedriver executable file
undetected_chromedriver_path = 'C:/Users/<YOUR_USERNAME>/AppData/Roaming/undetected_chromedriver/undetected_chromedriver.exe'

Add the Chrome installation path to the Selenium Chrome options. Configure the ChromeDriver service using the Undetected ChromeDriver by pointing to its executable path. Create a driver instance that includes the Chrome options and service settings:

scraper.rb
# ...

# set Chrome options
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument('--headless')
options.binary = chrome_exe_path

# configure ChromeDriver service with the specified path
service = Selenium::WebDriver::Service.chrome(path: undetected_chromedriver_path)

# create a new WebDriver instance
driver = Selenium::WebDriver.for :chrome, options: options, service: service

Open the protected web page and add a sleep function to allow your scraper some time to bypass the Turnstile CAPTCHA. Finally, grab a screenshot of the web page to see if you've bypassed the CAPTCHA:

scraper.rb
# ...

# navigate to a website
begin
  driver.navigate.to 'https://nowsecure.nl'

  # allow Undetected ChromeDriver some time to bypass the Turnstile challenge
  sleep(10)

  # take a screenshot to see if you passed
  driver.save_screenshot('nowsecure_screenshot.png')
  puts 'Screenshot saved.png'
ensure
  # close the driver instance
  driver.quit
end

You'll get the following code after combining all the snippets:

scraper.rb
# import the required Gem
require 'selenium-webdriver'

# set the path to your actual Chrome browser executable file
chrome_exe_path = 'C:/Program Files/Google/Chrome/Application/chrome.exe'

# set the path to the undetected_chromedriver executable file
undetected_chromedriver_path = 'C:/Users/<YOUR_USERNAME>/AppData/Roaming/undetected_chromedriver/undetected_chromedriver.exe'

# set Chrome options
options = Selenium::WebDriver::Chrome::Options.new
options.binary = chrome_exe_path
options.add_argument('--headless')

# configure ChromeDriver service with the specified path
service = Selenium::WebDriver::Service.chrome(path: undetected_chromedriver_path)

# create a new WebDriver instance
driver = Selenium::WebDriver.for :chrome, options: options, service: service

# navigate to a website
begin
  driver.navigate.to 'https://nowsecure.nl'

  # allow Undetected ChromeDriver some time to bypass the Turnstile challenge
  sleep(10)

  # take a screenshot to see if you passed
  driver.save_screenshot('nowsecure_screenshot.png')
  puts 'Screenshot saved.png'
ensure
  # close the driver instance
  driver.quit
end

Run the code to bypass CAPTCHA. Here's the generated screenshot, showing a success message in the Turnstile iframe:

Nowsecure Turnstile Success
Click to open the image in full screen

You've just bypassed CAPTCHA with Selenium's Undetected ChromeDriver in Ruby. 

However, the Undetected ChromeDriver won't bypass advanced anti-bots like Cloudflare, Akamai, and DataDome

Let's try to access the Cloudflare-protected G2 Reviews website to prove it. Replace the target URL in the previous code with G2's URL.

You'll see that it blocks your scraper with the following message:

Click to open the image in full screen

However, there are solutions to bypass even the most advanced anti-bot solutions. Keep reading!

Method #2: Bypass CAPTCHA With a Web Scraping API

CAPTCHAs and anti-bots like Cloudflare will block most free open-source solutions. That's because most complex anti-bots use advanced bot detection mechanisms such as browser fingerprinting and machine learning, which free bypass solutions can't cope with.

The best way to bypass any CAPTCHA is via a web scraping API like ZenRows. It provides a full-fledged anti-bot bypass toolkit, such as premium proxy autorotation, headless browser, request header optimizer, and more.

Let's use ZenRows to access the G2 Reviews page that blocked you earlier.

Sign up to open the ZenRows Request Builder. Once in the Builder, paste the target URL in the link box, check the Premium Proxies checkbox, and click JS Rendering. Select the API connection mode and choose Ruby as your programming language. Copy and paste the generated code into your scraper.rb file.

ZenRows Request Builder
Click to open the image in full screen

Here's the generated code:

scraper.rb
# gem install faraday
require 'faraday'

url = URI.parse('https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fasana%2Freviews&js_render=true&premium_proxy=true')
conn = Faraday.new()
conn.options.timeout = 180
res = conn.get(url, nil, nil)
print(res.body)

The code scrapes the protected website's HTML, as shown below:

Output
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
    <title>Asana Reviews, Pros + Cons, and Top Rated Features</title>
</head>
<body>
    <!-- other content omitted for brevity -->
</body>

Congratulations! You just bypassed an advanced CAPTCHA with ZenRows and are ready to scrape any website without getting blocked.

Conclusion

You've learned the two ways to handle CAPTCHAs while scraping with Selenium in Ruby. While both methods will work for some anti-bot systems, the recommended approach is using the web scraping API. This solution will let your scraper work uninterrupted, ensuring you can scrape all the data you need without worrying about extra setups, language limitations, and bottlenecks caused by failed requests.

The best web scraping API that guarantees success is ZenRows, an all-in-one content extraction toolkit for scraping any website, regardless of its protection level. Try ZenRows for free now without a credit card!

Ready to get started?

Up to 1,000 URLs for free are waiting for you