Selenium in Ruby for Web Scraping: Tutorial 2024

March 17, 2024 ยท 9 min read

Selenium is the most popular browser automation tool for testing and web scraping, and Ruby is one of the languages supported by the project. Explore the Selenium Ruby gem selenium-webdriver.

Here, you'll see the basics and then learn how to mimic more complex user interactions. At the end of this Selenium WebDriver Ruby tutorial, you'll know:

Why You Should Use Selenium in Ruby

Selenium is one the best tools for browser automation, for both testing and scraping. Its consistent API opens the door to cross-platform, cross-browser, and cross-language automated scripts. If you want to test a web app or perform headless browser scraping, Selenium is the right tool!

Ruby is one of the many languages officially supported by the project. The bindings are available through the open-source gem selenium-webdriver.

Before moving on to the next section, consider exploring our guide on web scraping with Ruby.

How to Use Selenium in Ruby

Follow the steps below and learn how to retrieve data from the following infinite-scrolling demo page. This page dynamically loads new products as the user scrolls down via JavaScript. As a page that requires user interaction for data retrieval, you need Selenium to scrape it.

Infinite scrolling page
Click to open the image in full screen
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

It's time to learn the basics of Selenium with Ruby!

Step 1: Install Selenium for Ruby

First, make sure you have Ruby installed on your computer. Download the installer and follow the wizard to set it up. You now have what you need to get started.

Create a folder for your Ruby project called selenium-ruby-demo and enter it in the terminal:

Terminal
mkdir selenium-ruby-demo
cd selenium-ruby-demo

Add a blank scraper.rb file inside it, and open the project folder in your favorite IDE. IntelliJ IDEA or Visual Studio with the Ruby LSP extension will do.

Awesome! selenium-ruby-demo will now contain a new Ruby project.

Time to install selenium-webdriver, the gem with the Ruby bindings for Selenium. Open the terminal and run:

Terminal
gem install selenium-webdriver

Wait a few seconds while the command installs the Selenium Ruby library.

Open scraper.rb and add the line below to import the gem in your script:

scraper.rb
require "selenium-webdriver"

selenium-webdriver already comes with the Chrome driver required to control a Chrome instance. Thus, you're ready to go!

Step 2: Scrape With Selenium in Ruby

Use these lines to initialize a WebDriver instance to control a headless Chrome window:

scraper.rb
# define the browser options
options = Selenium::WebDriver::Chrome::Options.new
# to run Chrome in headless mode
options.add_argument("--headless") # comment out in development
# create a driver instance to control Chrome
# with the specified options
driver = Selenium::WebDriver.for :chrome, options: options

Don't forget to release the driver resources with this line at the end of the script:

scraper.rb
# close the browser and release its resources
driver.quit

Next, use the navigate.to method to open the desired page in the controlled browser:

scraper.rb
driver.navigate.to "https://scrapingclub.com/exercise/list_infinite_scroll/"

Then, call the page_source method to get the source HTML of the page. Print it in the terminal with puts:

scraper.rb
# extract the HTML of the target page
# and print it
html = driver.page_source
puts html

Here's what the current scraper.rb file looks like so far:

scraper.rb
require "selenium-webdriver"

# define the browser options
options = Selenium::WebDriver::Chrome::Options.new
# to run Chrome in headless mode
options.add_argument("--headless") # comment out in development
# create a driver instance to control Chrome
# with the specified options
driver = Selenium::WebDriver.for :chrome, options: options

# connect to the target page
driver.navigate.to "https://scrapingclub.com/exercise/list_infinite_scroll/"

# extract the HTML of the target page
# and print it
html = driver.page_source
puts html

# close the browser and release its resources
driver.quit

Comment out the line to configure Chrome in headless mode and execute your script:

Terminal
ruby scraper.rb

The Ruby selenium-webdriver gem will launch Chrome and open the Infinite Scrolling demo page:

infinite scrolling demo page
Click to open the image in full screen

The Ruby script will also log the HTML associated with that page in the terminal:

Terminal
<html class="h-full"><head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <meta name="description" content="Learn to scrape infinite scrolling pages"><title>Scraping Infinite Scrolling Pages (Ajax) | ScrapingClub</title>
  <link rel="icon" href="/static/img/icon.611132651e39.png" type="image/png">
  <!-- Omitted for brevity... -->

That's exactly the source HTML code of the target page. Well done!

Step 3: Extract the Data You're Interested In

The Selenium WebDriver Ruby gem gives you everything you need to scrape data from an HTML page. Assume the goal of your scraper is to collect the name and price of each product. Achieving that involves three tasks:

  1. Select the product cards on the page with an HTML element selection strategy.
  2. Extract the desired information from each of them.
  3. Store the scraped data in a Ruby data structure.

An HTML node selection strategy usually relies on an XPath expression or CSS Selector. You can use both in Selenium, but CSS selectors tend to be more intuitive than XPath expressions. For an in-depth comparison, check out our guide on CSS Selector vs. XPath.

Let's keep it simple and opt for CSS selectors. To figure out how to define the selectors for the data extraction goal, analyze the HTML code of a product card. Visit the target page in the browser, right-click on a product, and inspect it with DevTools:

DevTools inspection
Click to open the image in full screen

Expand the HTML code of the DOM and note that:

  • Each product card is a <div> with a "post" class.
  • The product name is in an <h4> node.
  • The product price in an <h5> element.

With the instructions below, you'll learn how to get the name and price from all product cards on the page.

Before diving into the scraping logic, add a custom Struct to represent the product items:

scraper.rb
Product = Struct.new(:name, :price)

Next, initialize an array. This will store the Product objects with the scraped data:

scraper.rb
products = []

Use the find_elements method with the :css option to select all HTML products via a CSS selector:

scraper.rb
html_products = driver.find_elements(:css, ".post")

Iterate over them, select the price and name, instantiate a new Product object, and add it to products:

scraper.rb
html_products.each do |html_product|
  # extracting the data of interest
  # from the current product HTML element
  name = html_product.find_element(:css, "h4").text
  price = html_product.find_element(:css, "h5").text

  # initialize a new Product object with the scraped
  # data and add it to the list
  product = Product.new(name, price)
  products.push(product)
end

The text method allows you to get the price and name values. The Ruby Selenium library also provides methods for accessing HTML attributes and more.

You can finally print the scraped data with:

scraper.rb
products.each do |product|
  puts "Price: #{product.price}\nName: #{product.name}\n\n"
end

Here's what scraper.rb should now contain:

scraper.rb
require "selenium-webdriver"

# define a custom type to represent the data to scrape
Product = Struct.new(:name, :price)

# define the browser options
options = Selenium::WebDriver::Chrome::Options.new
# to run Chrome in headless mode
options.add_argument("--headless") # comment out in development
# create a driver instance to control Chrome
# with the specified options
driver = Selenium::WebDriver.for :chrome, options: options

# connect to the target page
driver.navigate.to "https://scrapingclub.com/exercise/list_infinite_scroll/"

# select all HTML product elements
html_products = driver.find_elements(:css, ".post")

# where to store the scraped data
products = []

# iterate over the list of HTML products
html_products.each do |html_product|
  # extracting the data of interest
  # from the current product HTML element
  name = html_product.find_element(:css, "h4").text
  price = html_product.find_element(:css, "h5").text

  # initialize a new Product object with the scraped
  # data and add it to the list
  product = Product.new(name, price)
  products.push(product)
end

# print the scraped products
products.each do |product|
  puts "Price: #{product.price}\nName: #{product.name}\n\n"
end

# close the browser and release its resources
driver.quit

Execute it, and it'll produce:

Output
Price: $24.99
Name: Short Dress

Price: $29.99
Name: Patterned Slacks

# omitted for brevity

Price: $59.99
Name: Short Lace Dress

Price: $34.99
Name: Fitted Dress

Terrific! The Ruby browser automation parsing logic works like a charm.

Step 4: Export Data to CSV

It only remains to export the retrieved data to CSV. This will make it much easier to read, use, analyze, and share the collected data.

The best way to achieve that goal is to use the csv gem from the Ruby standard library. Import it in scraper.rb:

scraper.rb
require "csv"

Create an output CSV file with the open method and initialize it with the header row. Cycle over the products and populate the CSV file with the << operator:

scraper.rb
CSV.open("products.csv", "wb", write_headers: true, headers: ["name", "price"]) do |csv| 
        # add each product as a new row
        products.each do |product| 
                csv << product 
        end 
end

This is the final code of the Ruby Selenium WebDriver scraping script:

scraper.rb
require "selenium-webdriver"
require "csv"

# define a custom type to represent the data to scrape
Product = Struct.new(:name, :price)

# define the browser options
options = Selenium::WebDriver::Chrome::Options.new
# to run Chrome in headless mode
options.add_argument("--headless") # comment out in development
# create a driver instance to control Chrome
# with the specified options
driver = Selenium::WebDriver.for :chrome, options: options

# connect to the target page
driver.navigate.to "https://scrapingclub.com/exercise/list_infinite_scroll/"

# select all HTML product elements
html_products = driver.find_elements(:css, ".post")

# where to store the scraped data
products = []

# iterate over the list of HTML products
html_products.each do |html_product|
  # extracting the data of interest
  # from the current product HTML element
  name = html_product.find_element(:css, "h4").text
  price = html_product.find_element(:css, "h5").text

  # initialize a new Product object with the scraped
  # data and add it to the list
  product = Product.new(name, price)
  products.push(product)
end

# export the scraped products to CSV
CSV.open("products.csv", "wb", write_headers: true, headers: ["name", "price"]) do |csv|
  # add each product as a new row
  products.each do |product|
          csv << product
  end
end

# close the browser and release its resources
driver.quit

Run it:

Terminal
ruby scraper.rb

Wait for the script execution to be over. A products.csv file will appear in the root folder of your project. Open it, and you will notice the following data:

DATA OUTPUT
Click to open the image in full screen

Great job! You now know the basics of Selenium with Ruby.

Note that the current output involves only ten rows. Why? Because the target page initially has ten products and loads more data via infinite scrolling as the user scrolls down.

This Selenium Ruby tutorial is far from over. Learn how to scrape all products in the next section!

Interact With a Browser With Ruby WebDriver

The selenium-webdriver gem can perform several web interactions. Those include waits, mouse movements, scrolls, and others. Using them is at the basis of interacting with dynamic content pages as human beings would, with the benefits of fooling anti-bot systems.

Some of the most useful interactions supported by the Selenium WebDriver Ruby library are:

  • Click elements.
  • Move the mouse and simulate other mouse actions.
  • Wait for elements to be on the page or to be visible.
  • Scroll up and down the page.
  • Type data into input fields.
  • Submit forms.
  • Take screenshots.

Most of these actions are available via built-in methods. Otherwise, you can use the execute_script method to run JavaScript code directly on the page. That helps you simulate complex interactions that require custom logic.

Find out how to retrieve each product in the infinite scroll demo page, and then see other popular Selenium Ruby interactions!

Scrolling

The destination page has only ten products on the first load, relying on infinite scrolling to load more. To get the complete list of products, you need to simulate the scroll-down user interaction.

Keep in mind that Selenium doesn't offer a built-in method for scrolling. Thus, you need a custom JavaScript script!

The JS snippet below instructs the browser to scroll down the page 10 times at an interval of 0.5 seconds each:

scraper.b
// scroll down the page 10 times
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === numScrolls) {
    clearInterval(scrollInterval)
  }
}, 500)

Selenium will now scroll down the page as desired. However, the script may produce the same result as before. The reason is that loading and rendering new products take time, and you need to wait for that operation to complete.

As a solution, stop the script execution for 10 seconds:

scraper.rb
sleep(10)

Put it all together, and you'll get:

scraper.rb
require "selenium-webdriver"
require "csv"

# define a custom type to represent the data to scrape
Product = Struct.new(:name, :price)

# define the browser options
options = Selenium::WebDriver::Chrome::Options.new
# to run Chrome in headless mode
options.add_argument("--headless") # comment out in development
# create a driver instance to control Chrome
# with the specified options
driver = Selenium::WebDriver.for :chrome, options: options

# connect to the target page
driver.navigate.to "https://scrapingclub.com/exercise/list_infinite_scroll/"

# scroll down the page to load all products
scrolling_script = "
    // scroll down the page 10 times
    const scrolls = 10
    let scrollCount = 0

    // scroll down and then wait for 0.5s
    const scrollInterval = setInterval(() => {
      window.scrollTo(0, document.body.scrollHeight)
      scrollCount++

      if (scrollCount === numScrolls) {
          clearInterval(scrollInterval)
      }
    }, 500)
";
driver.execute_script(scrolling_script)

# wait for products to be loaded and rendered
sleep(10)

# select all HTML product elements
html_products = driver.find_elements(:css, ".post")

# where to store the scraped data
products = []

# iterate over the list of HTML products
html_products.each do |html_product|
  # extracting the data of interest
  # from the current product HTML element
  name = html_product.find_element(:css, "h4").text
  price = html_product.find_element(:css, "h5").text

  # initialize a new Product object with the scraped
  # data and add it to the list
  product = Product.new(name, price)
  products.push(product)
end

# export the scraped products to CSV
CSV.open("products.csv", "wb", write_headers: true, headers: ["name", "price"]) do |csv|
  # add each product as a new row
  products.each do |product|
          csv << product
  end
end

# close the browser and release its resources
driver.quit

Run the Ruby script and wait for the scraping logic to complete

Terminal
ruby scraper.rb

The script will take a while because of the 10-second wait. Open products.csv and verify that the output now involves all 60 products:

DATA OUTPUT
Click to open the image in full screen

You just scraped all the products on the page. Hooray! ๐Ÿฅณ

Wait for Element

The current Selenium Ruby script achieves the scraping goal defined at the beginning of the article. Is it over? Well, noโ€ฆ

The issue is that the script relies on an implicit wait, which is a discouraged practice in the automation world. The reason is easy, as implicit waits make your scraping logic flaky by definition. Why? A simple network or computer slowdown will make your scraper fail!

The results of your scraping operation shouldn't depend on variable conditions like those. So, you need a better approach. Opt instead for explicit waits to verify the presence of a particular node on the DOM. This is a best practice because it makes your logic reliable, robust, and consistent.

Initialize a Wait object and use it to wait up to 10 seconds for the 60th product to be on the page:

scraper.rb
    wait = Selenium::WebDriver::Wait.new(timeout: 10)
    wait.until { driver.find_element(:css, ".post:nth-child(60)") }

Replace the sleep() instruction with the two lines above. The scraper will now wait for the AJAX calls triggered by the scrolls to update the DOM until it has all the products.

This is the code of the definitive Ruby Selenium script:

scraper.rb
require "selenium-webdriver"
require "csv"

# define a custom type to represent the data to scrape
Product = Struct.new(:name, :price)

# define the browser options
options = Selenium::WebDriver::Chrome::Options.new
# to run Chrome in headless mode
options.add_argument("--headless") # comment out in development
# create a driver instance to control Chrome
# with the specified options
driver = Selenium::WebDriver.for :chrome, options: options

# connect to the target page
driver.navigate.to "https://scrapingclub.com/exercise/list_infinite_scroll/"

# scroll down the page to load all products
scrolling_script = "
    // scroll down the page 10 times
    const scrolls = 10
    let scrollCount = 0

    // scroll down and then wait for 0.5s
    const scrollInterval = setInterval(() => {
      window.scrollTo(0, document.body.scrollHeight)
      scrollCount++

      if (scrollCount === numScrolls) {
          clearInterval(scrollInterval)
      }
    }, 500)
";
driver.execute_script(scrolling_script)

# wait for the 60th products to be on the page
wait = Selenium::WebDriver::Wait.new(timeout: 10)
wait.until { driver.find_element(:css, ".post:nth-child(60)") }

# select all HTML product elements
html_products = driver.find_elements(:css, ".post")

# where to store the scraped data
products = []

# iterate over the list of HTML products
html_products.each do |html_product|
  # extracting the data of interest
  # from the current product HTML element
  name = html_product.find_element(:css, "h4").text
  price = html_product.find_element(:css, "h5").text

  # initialize a new Product object with the scraped
  # data and add it to the list
  product = Product.new(name, price)
  products.push(product)
end

# export the scraped products to CSV
CSV.open("products.csv", "wb", write_headers: true, headers: ["name", "price"]) do |csv|
  # add each product as a new row
  products.each do |product|
          csv << product
  end
end

# close the browser and release its resources
driver.quit

Launch the script again, and it'll produce the same results as before but much faster. This time, the script will produce consistent results and only wait for the right amount of time.

Now that you know how to use Ruby with Selenium like a pro, it's time to dig into other useful interactions.

Wait for Page to Load

driver.navigate.to automatically waits for the browser to fire the page load event. In other terms, the Selenium Ruby gem already waits for pages to load by default.

Over time, pages have become more and more interactive. Telling when a webpage has actually loaded is no longer easy, and you may need advanced wait logic. Use the until method from Wait with custom logic for more control.

Get more information in the API documentation.

Click Elements

The Element class offers the click method to simulate click interactions. Call it on an Element instance as follows:

scraper.rb
element.click

The browser will send a mouse click event on the node, triggering the HTML onclick() callback.

When click results in a page change (e.g., in the snippet below), you have to adapt the scraping logic to the new DOM:

scraper.rb
html_product_element = driver.find_element(:css, ".post")
html_product_element.click
# you are now on the detail product page...
    
# new scraping logic...

# driver.find_element(...)

Amazing! You're now the master of simulated interactions in the Selenium WebDriver Ruby gem.

Avoid Getting Blocked While Scraping with Selenium in Ruby

The biggest challenge in web scraping is getting blocked by anti-bot solutions. These technologies keep track of incoming requests and automatically stop automated ones.

If you make too many requests in a short time from the same IP, don't set the right headers, or act suspicious, you're likely to be banned. When that happens and you don't have a way to change your identity, it's game over!

A possible solution is to pass a real-world User-Agent string to the --user-agent Chrome option and a proxy URL to --proxy-server. That will hide your IP while making your requests come from an original browser. Learn more in our guide on Selenium user agent.

This is how you can set a User-Agent and proxy in the Selenium Ruby bindings:

scraper.rb
# set a custom user agent
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
options.add_argument("--user-agent=#{user_agent}")

# set a custom proxy
proxy_url = "<YOUR_PROXY_URL>"
options.add_argument("--proxy-server=#{proxy_url}")

These two tips are useful but they're just baby steps to do web scraping without getting blocked. Advanced solutions like Cloudflare will still be able to detect the automated nature of your requests and stop you with a CAPTCHA:

CAPTCHA Demo
Click to open the image in full screen

Is there a truly effective solution? Of course, there is. Its name is ZenRows! As a scraping API, it seamlessly integrates with the selenium-webdriver gem to extend it with an anti-bot bypass system, IP and User-Agent rotation capabilities, and more.

Give your Ruby script superpowers by integrating it with ZenRows. Sign up for free to receive 1,000 credits. You'll also get access to the following Request Builder page:

building a scraper with zenrows
Click to open the image in full screen

Now, assume you want to scrape data from the Cloudflare-protected G2.com page seen earlier.

In the โ€œBuild Your Requestโ€ section, paste your target URL (https://www.g2.com/products/airtable/reviews) into the "URL to Scrape" field. Check "Premium Proxy" to get rotating IPs and make sure the "JS Rendering" feature isn't turned on (as you don't want both ZenRows and Selenium to render the page).

On the right, select the โ€œcURLโ€ option to get the API URL you can use with any scraping tool. Copy the generated URL and pass it to the Selenium navigate.to method:

scraper.rb
require "selenium-webdriver"

# define the browser options
options = Selenium::WebDriver::Chrome::Options.new
# set a custom user agent
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
options.add_argument("--user-agent=#{user_agent}")
# set a custom proxy
proxy_url = "<YOUR_PROXY_URL>"
options.add_argument("--proxy-server=#{proxy_url}")
# to run Chrome in headless mode
options.add_argument("--headless") # comment out in development
# create a driver instance to control Chrome
# with the specified options
driver = Selenium::WebDriver.for :chrome, options: options

# connect to the target page
driver.navigate.to "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fairtable%2Freviews&premium_proxy=true"

# extract the HTML of the target page
# and print it
html = driver.page_source
puts html

# close the browser and release its resources
driver.quit

Launch the script, and it'll print the source HTML of the G2.com page:

Output
<!DOCTYPE html>
<head>
  <meta charset="utf-8" />
  <link href="https://www.g2.com/assets/favicon-fdacc4208a68e8ae57a80bf869d155829f2400fa7dd128b9c9e60f07795c4915.ico" rel="shortcut icon" type="image/x-icon" />
  <title>Airtable Reviews 2024: Details, Pricing, &amp; Features | G2</title>
  <!-- omitted for brevity ... -->

Wow, say bye bye to CAPTCHAS ๐Ÿ‘‹. You just integrated ZenRows into your Selenium Ruby script.

What about all other anti-bot traps that trigger after interaction via Selenium. The great news is you no longer need Selenium. Take advantage of the JavaScript instructions feature and replace Selenium with ZenRows. That will also save you money, considering the cost of Selenium.

Ruby Selenium Alternative

Using Selenium with Ruby is easy, thanks to its intuitive API. selenium-webdriver gem is a great tool, but you may want to explore some alternatives. The best Ruby Selenium alternative gems are:

  • ZenRows: A cloud data extraction API that integrates with any technology and enables efficient and effective web scraping on any page.
  • Watir: An open-source Ruby library that uses browser automation for testing web applications and scraping sites. It can interact with a browser the same way people do.
  • Capybara: A Ruby package to talk with different drivers to execute automation scripts through the same clean and simple interface. It supports Selenium, Webkit, or pure Ruby drivers.

Conclusion

In this Selenium Ruby tutorial, you learned the fundamentals of browser automation in Ruby. You started from the basics of Selenium with Ruby and then dug into more advanced techniques. You've become a Selenium pro!

You're now able to set up a Ruby Selenium WebDriver project and know how to use selenium-webdriver gem to collect data from dynamic content pages. You also saw what interactions you can simulate with it, the challenges of web scraping, and how to face them.

The problem is that anti-bots can block most browser automation scripts. Avoid that headache with ZenRows, a web scraping API with IP rotation, browser automation capabilities, and the most powerful anti-scraping toolkit available. Scraping dynamic content sites has never been easier.

Frequent Questions

What Is Ruby Selenium Used for?

Ruby Selenium is primarily used for automating web browsers. Specifically, it allows developers to perform automated testing and web scraping. It provides a complete API to simulate user interactions, such as clicking buttons, filling out forms, and navigating through different pages.

Can We Use Ruby in Selenium?

Yes, you can use Selenium with Ruby. Selenium is a versatile web automation tool that supports multiple programming languages, inluding Ruby. The Ruby Selenium bindings are available through a gem called selenium-webdriver.

Does Selenium Support Ruby?

Yes, Selenium supports Ruby. Selenium WebDriverโ€”the core component of Seleniumโ€”provides bindings for Ruby. This means that Ruby developers can write browser automation scripts with Selenium.

Ready to get started?

Up to 1,000 URLs for free are waiting for you