How to Set a Proxy in Selenium Ruby

May 6, 2024 · 10 min read

Are you getting banned while Scraping with Selenium in Ruby? You need a proxy with your Selenium scraper.

In this tutorial, you'll learn how to set up a single proxy and rotate proxies from a pool.

How to Use a Proxy in Selenium with Ruby?

A proxy server routes your requests through its IP address, providing a degree of anonymity. You can use a single free proxy or rotate proxies from a list. However, implementing residential proxies will boost your anonymity.

In this section, you'll learn to add single, rotating, and residential proxies using Selenium while scraping with Ruby. In each case, you'll use https://httpbin.org/ip, a website that returns your current IP address.

Before starting, check your current IP without a proxy. Send a regular request to https://httpbin.org/ip using the code below:

scraper.rb
# import the required gems
require "selenium-webdriver"
require "nokogiri"

# define Selenium capabilities
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument("--headless")

# create a driver object
driver = Selenium::WebDriver.for(:chrome, options: options)

# open the target website
driver.navigate.to "https://httpbin.org/ip"

# print the page source
puts Nokogiri::HTML(driver.page_source).text

# close browser
driver.quit

The code prints your machine's IP address:

Output
{
  "origin": "105.XXX.Y.ZZZ"
}

Next, let's set up a proxy to change that result.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Step 1: Use a Proxy in an HTTP Request

Using a single proxy server is the most basic way to mask your request in Selenium. In this example, you'll add a single free proxy. Grab one from the Free Proxy List

To begin, import the required libraries into your Ruby file and specify the proxy server URL:

scraper.rb
# import the required gems
require "selenium-webdriver"
require "nokogiri"

# specify the proxy server
proxy_url = "http://72.10.160.174:22669"

Now, set up the WebDriver in headless mode. Add the proxy server address to the driver options:

scraper.rb
# ...

# configure the WebDriver options
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument("--headless")

# add the proxy server address
options.add_argument("--proxy-server=#{proxy_url}")

Start a Chrome browser instance with the specified driver options, navigate to the target web page, and print the page source in plain text to view the current IP address:

scraper.rb
# ...

# create a driver object
driver = Selenium::WebDriver.for(:chrome, options: options)

# open the target website
driver.navigate.to "https://httpbin.org/ip"

# print the page source
puts Nokogiri::HTML(driver.page_source).text

# close browser
driver.quit

After merging the snippets, your complete code should look like this:

scraper.rb
# import the required gems
require "selenium-webdriver"
require "nokogiri"

# specify the proxy server
proxy_url = "http://72.10.160.174:22669"

# configure the WebDriver options
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument("--headless")

# add the proxy server address
options.add_argument("--proxy-server=#{proxy_url}")

# create a driver object
driver = Selenium::WebDriver.for(:chrome, options: options)

# open the target website
driver.navigate.to "https://httpbin.org/ip"

# print the page source
puts Nokogiri::HTML(driver.page_source).text

# close browser
driver.quit

The code outputs the IP address of the proxy server used:

Output
{
  "origin": "72.10.160.174"
}

You’ve just changed the IP address of your Selenium scraper in Ruby. Good job! 

However, using a single proxy for multiple requests increases the chance of anti-bot detection. Next, let's see how to avoid that by rotating proxies.

Step 2: Rotate Proxies in Selenium for Ruby

Proxy rotation lets you select a different proxy from a list for each request. This approach is better than using a static single proxy because it doesn’t repeat the same proxy address for several requests.

To start, create your list of proxies with the Free Proxy List. Here’s how to do it in your Ruby script:

scraper.rb
# import the required libraries
require "selenium-webdriver"
require "nokogiri"

# create a proxy list
proxies = [
  "http://50.174.214.221:80",
  "http://72.10.160.174:22669",
  "http://93.117.225.195:80",
  "http://13.48.109.48:3128"
]

Next, run the scraping logic, including the instance in a for loop based on the proxy list’s length. To do that, specify an initial index. Then, start the for loop by increasing the proxy list index: 

scraper.rb
# ...

# initialize current index to 0
current_index = 0

# rotate proxies based on the proxy list length
proxies.length.times do

  # increase the index of each proxy address
  next_index = (current_index + 1) % proxies.length
  proxy, current_index = proxies[next_index], next_index
  
end

Set up the WebDriver in headless mode and add a proxy server selected from the proxy list:

scraper.rb
# ...

# rotate proxies based on the proxy list length
proxies.length.times do
  
  #   ...
  
  # set up a new WebDriver for each proxy
  options = Selenium::WebDriver::Chrome::Options.new
  options.add_argument("--headless")

  # add each proxy from the current index
  options.add_argument("--proxy-server=#{proxy}")
  
end

Spin a Chrome browser instance, open the target website, and print its page source to see your IP address. Finally, close the driver.

scraper.rb
# ...

# rotate proxies based on the proxy list length
proxies.length.times do
  
  # ...

  # start a Chrome instance
  driver = Selenium::WebDriver.for :chrome, options: options

  # navigate to the target website to view your IP address
  driver.get("https://httpbin.org/ip")

  # print the page source
  puts Nokogiri::HTML(driver.page_source).text

  # close the WebDriver
  driver.quit
  
end

Combine the snippets, and your final code should look like this:

scraper.rb
# import the required libraries
require "selenium-webdriver"
require "nokogiri"

# create a proxy list
proxies = [
  "http://50.174.214.221:80",
  "http://72.10.160.174:22669",
  "http://93.117.225.195:80",
  "http://13.48.109.48:3128"
]

# initialize current index to 0
current_index = 0

# rotate proxies based on the proxy list length
proxies.length.times do

  # increase the index of each proxy address
  next_index = (current_index + 1) % proxies.length
  proxy, current_index = proxies[next_index], next_index

  # set up a new WebDriver for each proxy
  options = Selenium::WebDriver::Chrome::Options.new
  options.add_argument("--headless")

  # add each proxy from the current index
  options.add_argument("--proxy-server=#{proxy}")

  # start a Chrome instance
  driver = Selenium::WebDriver.for :chrome, options: options

  # navigate to the target website to view your IP address
  driver.get("https://httpbin.org/ip")

  # print the page source
  puts Nokogiri::HTML(driver.page_source).text

  # close the WebDriver
  driver.quit
end

This code runs a different browser instance for each proxy in the list:

Output
{
  "origin": "50.174.214.22"
}

# ... 2 more IPs omitted for brevity

{
  "origin": "13.48.109.544"
}

Your Ruby's Selenium scraper now uses a different proxy per request. Nicely done!

So far, you’ve only used free proxies in this tutorial. This approach works for learning the basics of proxy setup, but it’s usually ineffective for web scraping. You’ll most likely get blocked because free proxies have a short lifespan. 

For instance, you can’t scrape a protected website such as the G2 Reviews page with the methods presented above. Try it out with the following code:

scraper.rb
# import the required libraries
require "selenium-webdriver"
require "nokogiri"

# create a proxy list
create a proxy list
proxies = [
  "http://50.174.214.221:80",
  "http://72.10.160.174:22669",
  "http://93.117.225.195:80",
  "http://13.48.109.48:3128"
]

# initialize current index to 0
current_index = 0

# rotate proxies based on the proxy list length
proxies.length.times do

  # increase the index of each proxy address
  next_index = (current_index + 1) % proxies.length
  proxy, current_index = proxies[next_index], next_index

  # set up a new WebDriver for each proxy
  options = Selenium::WebDriver::Chrome::Options.new
  options.add_argument("--headless")

  # add each proxy from the current index
  options.add_argument("--proxy-server=#{proxy}")

  # start a Chrome instance
  driver = Selenium::WebDriver.for :chrome, options: options

  # navigate to a protected website like G2
  driver.get("https://www.g2.com/products/asana/reviews")

  # print the page source
  puts Nokogiri::HTML(driver.page_source)

  # close the WebDriver
  driver.quit
end

The request got blocked by Cloudflare Turnstile:

Output
<!DOCTYPE html>
<html class="no-js" lang="en-US">
<head>
  <title>Attention Required! | Cloudflare</title>
</head>
<body>
    
    <!-- ... -->

      <div class="cf-wrapper cf-header cf-error-overview">
        <h1 data-translate="block_headline">Sorry, you have been blocked</h1>
      </div>

    <!-- ... -->
    
</body>
</html>

Scraping didn't work despite rotating the proxies. A more dependable option is to use a premium proxy, which you'll learn to do in the next section.

Step 3: Get a Residential Proxy to Avoid Getting Blocked

Free proxies can get you blocked during scraping. You need a premium proxy with authentication to boost your chances of avoiding detection or IP bans. Check out our list of the best web scraping proxies that integrate with Selenium in Ruby.

Still, scraping multiple pages can trigger more advanced anti-bot systems, which will block you even after implementing premium proxies in Selenium.

The recommended solution is to use a web scraping API like ZenRows. It automatically configures premium proxies, fixes request headers and bypasses any blocks.

Let's use ZenRows and Ruby's Faraday as the HTTP client to access the website that blocked us in the previous section.

Sign up for free, and you'll get to the Request Builder. Paste the target URL in the link box, toggle the JS Rendering Boost mode, and activate Premium Proxies. Select Ruby as your preferred language. Then, copy and paste the generated code into your Ruby script.

ZenRows Request Builder
Click to open the image in full screen

The generated code should look like this:

Example
# gem install faraday
require "faraday"

# format the URL
url =
  "https://api.zenrows.com/v1/?"\
  "apikey=<YOUR_ZENROWS_API_KEY>"\
  "&url=https://www.g2.com/products/asana/reviews"\
  "&js_render=true&premium_proxy=true"

# make your request
conn = Faraday.new()
conn.options.timeout = 180
res = conn.get(url, nil, nil)
print(res.body)

The code opens the protected website and extracts its HTML:

Output
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
    <title>Asana Reviews 2024</title>
</head>
<body>
    <!-- other content omitted for brevity -->
</body>
</html>

You just bypassed an anti-bot protection with ZenRows. Congratulations! 

ZenRows also features JavaScript instructions for interacting dynamically with websites that use JavaScript to load content. Hence, you can replace Selenium with ZenRows and forget the technicalities of managing browser instances, authenticating proxies, or setting up a WebDriver.

Conclusion

In this article, you've learned how to set up a single proxy and change proxies per request through proxy rotation using Selenium scraper in Ruby.

Setting up a premium proxy may help avoid blocks, but it’s often not enough for advanced anti-bot solutions. Only a web scraping API like ZenRows can bypass any detection system, allowing you to scrape websites at scale without getting blocked. Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you