Are you getting banned while Scraping with Selenium in Ruby? You need a proxy with your Selenium scraper.
In this tutorial, you'll learn how to set up a single proxy and rotate proxies from a pool.
How to Use a Proxy in Selenium with Ruby?
A proxy server routes your requests through its IP address, providing a degree of anonymity. You can use a single free proxy or rotate proxies from a list. However, implementing residential proxies will boost your anonymity.
In this section, you'll learn to add single, rotating, and residential proxies using Selenium while scraping with Ruby. In each case, you'll use https://httpbin.org/ip
, a website that returns your current IP address.
Before starting, check your current IP without a proxy. Send a regular request to https://httpbin.org/ip
using the code below:
# import the required gems
require "selenium-webdriver"
require "nokogiri"
# define Selenium capabilities
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument("--headless")
# create a driver object
driver = Selenium::WebDriver.for(:chrome, options: options)
# open the target website
driver.navigate.to "https://httpbin.org/ip"
# print the page source
puts Nokogiri::HTML(driver.page_source).text
# close browser
driver.quit
The code prints your machine's IP address:
{
"origin": "105.XXX.Y.ZZZ"
}
Next, let's set up a proxy to change that result.
The free proxies used in this tutorial are unlikely to work at the time of reading because they have a short lifespan. You may need to use new ones from the Free Proxy List.
Step 1: Use a Proxy in an HTTP Request
Using a single proxy server is the most basic way to mask your request in Selenium. In this example, you'll add a single free proxy. Grab one from the Free Proxy List.ย
To begin, import the required libraries into your Ruby file and specify the proxy server URL:
# import the required gems
require "selenium-webdriver"
require "nokogiri"
# specify the proxy server
proxy_url = "http://72.10.160.174:22669"
Now, set up the WebDriver in headless mode. Add the proxy server address to the driver options:
# ...
# configure the WebDriver options
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument("--headless")
# add the proxy server address
options.add_argument("--proxy-server=#{proxy_url}")
Start a Chrome browser instance with the specified driver options, navigate to the target web page, and print the page source in plain text to view the current IP address:
# ...
# create a driver object
driver = Selenium::WebDriver.for(:chrome, options: options)
# open the target website
driver.navigate.to "https://httpbin.org/ip"
# print the page source
puts Nokogiri::HTML(driver.page_source).text
# close browser
driver.quit
After merging the snippets, your complete code should look like this:
# import the required gems
require "selenium-webdriver"
require "nokogiri"
# specify the proxy server
proxy_url = "http://72.10.160.174:22669"
# configure the WebDriver options
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument("--headless")
# add the proxy server address
options.add_argument("--proxy-server=#{proxy_url}")
# create a driver object
driver = Selenium::WebDriver.for(:chrome, options: options)
# open the target website
driver.navigate.to "https://httpbin.org/ip"
# print the page source
puts Nokogiri::HTML(driver.page_source).text
# close browser
driver.quit
The code outputs the IP address of the proxy server used:
{
"origin": "72.10.160.174"
}
Youโve just changed the IP address of your Selenium scraper in Ruby. Good job!ย
However, using a single proxy for multiple requests increases the chance of anti-bot detection. Next, let's see how to avoid that by rotating proxies.
Step 2: Rotate Proxies in Selenium for Ruby
Proxy rotation lets you select a different proxy from a list for each request. This approach is better than using a static single proxy because it doesnโt repeat the same proxy address for several requests.
To start, create your list of proxies with the Free Proxy List. Hereโs how to do it in your Ruby script:
# import the required libraries
require "selenium-webdriver"
require "nokogiri"
# create a proxy list
proxies = [
"http://50.174.214.221:80",
"http://72.10.160.174:22669",
"http://93.117.225.195:80",
"http://13.48.109.48:3128"
]
Next, run the scraping logic, including the instance in a for
loop based on the proxy listโs length. To do that, specify an initial index. Then, start the for
loop by increasing the proxy list index:ย
# ...
# initialize current index to 0
current_index = 0
# rotate proxies based on the proxy list length
proxies.length.times do
# increase the index of each proxy address
next_index = (current_index + 1) % proxies.length
proxy, current_index = proxies[next_index], next_index
end
Set up the WebDriver in headless mode and add a proxy server selected from the proxy list:
# ...
# rotate proxies based on the proxy list length
proxies.length.times do
# ...
# set up a new WebDriver for each proxy
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument("--headless")
# add each proxy from the current index
options.add_argument("--proxy-server=#{proxy}")
end
Spin a Chrome browser instance, open the target website, and print its page source to see your IP address. Finally, close the driver.
# ...
# rotate proxies based on the proxy list length
proxies.length.times do
# ...
# start a Chrome instance
driver = Selenium::WebDriver.for :chrome, options: options
# navigate to the target website to view your IP address
driver.get("https://httpbin.org/ip")
# print the page source
puts Nokogiri::HTML(driver.page_source).text
# close the WebDriver
driver.quit
end
Combine the snippets, and your final code should look like this:
# import the required libraries
require "selenium-webdriver"
require "nokogiri"
# create a proxy list
proxies = [
"http://50.174.214.221:80",
"http://72.10.160.174:22669",
"http://93.117.225.195:80",
"http://13.48.109.48:3128"
]
# initialize current index to 0
current_index = 0
# rotate proxies based on the proxy list length
proxies.length.times do
# increase the index of each proxy address
next_index = (current_index + 1) % proxies.length
proxy, current_index = proxies[next_index], next_index
# set up a new WebDriver for each proxy
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument("--headless")
# add each proxy from the current index
options.add_argument("--proxy-server=#{proxy}")
# start a Chrome instance
driver = Selenium::WebDriver.for :chrome, options: options
# navigate to the target website to view your IP address
driver.get("https://httpbin.org/ip")
# print the page source
puts Nokogiri::HTML(driver.page_source).text
# close the WebDriver
driver.quit
end
This code runs a different browser instance for each proxy in the list:
{
"origin": "50.174.214.22"
}
# ... 2 more IPs omitted for brevity
{
"origin": "13.48.109.544"
}
Your Ruby's Selenium scraper now uses a different proxy per request. Nicely done!
So far, youโve only used free proxies in this tutorial. This approach works for learning the basics of proxy setup, but itโs usually ineffective for web scraping. Youโll most likely get blocked because free proxies have a short lifespan.ย
For instance, you canโt scrape a protected website such as the G2 Reviews page with the methods presented above. Try it out with the following code:
# import the required libraries
require "selenium-webdriver"
require "nokogiri"
# create a proxy list
create a proxy list
proxies = [
"http://50.174.214.221:80",
"http://72.10.160.174:22669",
"http://93.117.225.195:80",
"http://13.48.109.48:3128"
]
# initialize current index to 0
current_index = 0
# rotate proxies based on the proxy list length
proxies.length.times do
# increase the index of each proxy address
next_index = (current_index + 1) % proxies.length
proxy, current_index = proxies[next_index], next_index
# set up a new WebDriver for each proxy
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument("--headless")
# add each proxy from the current index
options.add_argument("--proxy-server=#{proxy}")
# start a Chrome instance
driver = Selenium::WebDriver.for :chrome, options: options
# navigate to a protected website like G2
driver.get("https://www.g2.com/products/asana/reviews")
# print the page source
puts Nokogiri::HTML(driver.page_source)
# close the WebDriver
driver.quit
end
The request got blocked by Cloudflare Turnstile:
<!DOCTYPE html>
<html class="no-js" lang="en-US">
<head>
<title>Attention Required! | Cloudflare</title>
</head>
<body>
<!-- ... -->
<div class="cf-wrapper cf-header cf-error-overview">
<h1 data-translate="block_headline">Sorry, you have been blocked</h1>
</div>
<!-- ... -->
</body>
</html>
Scraping didn't work despite rotating the proxies. A more dependable option is to use a premium proxy, which you'll learn to do in the next section.
Step 3: Get a Residential Proxy to Avoid Getting Blocked
Free proxies can get you blocked during scraping. You need a premium proxy with authentication to boost your chances of avoiding detection or IP bans. Check out our list of the best web scraping proxies that integrate with Selenium in Ruby.
Still, scraping multiple pages can trigger more advanced anti-bot systems, which will block you even after implementing premium proxies in Selenium.
The recommended solution is to use a web scraping API like ZenRows. It automatically configures premium proxies, fixes request headers and bypasses any blocks.
Let's use ZenRows and Ruby's Faraday as the HTTP client to access the website that blocked us in the previous section.
Sign up for free, and you'll get to the Request Builder. Paste the target URL in the link box, toggle the JS Rendering Boost mode, and activate Premium Proxies. Select Ruby as your preferred language. Then, copy and paste the generated code into your Ruby script.
The generated code should look like this:
# gem install faraday
require "faraday"
# format the URL
url =
"https://api.zenrows.com/v1/?"\
"apikey=<YOUR_ZENROWS_API_KEY>"\
"&url=https://www.g2.com/products/asana/reviews"\
"&js_render=true&premium_proxy=true"
# make your request
conn = Faraday.new()
conn.options.timeout = 180
res = conn.get(url, nil, nil)
print(res.body)
The code opens the protected website and extracts its HTML:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
<title>Asana Reviews 2024</title>
</head>
<body>
<!-- other content omitted for brevity -->
</body>
</html>
You just bypassed an anti-bot protection with ZenRows. Congratulations!ย
ZenRows also features JavaScript instructions for interacting dynamically with websites that use JavaScript to load content. Hence, you can replace Selenium with ZenRows and forget the technicalities of managing browser instances, authenticating proxies, or setting up a WebDriver.
Conclusion
In this article, you've learned how to set up a single proxy and change proxies per request through proxy rotation using Selenium scraper in Ruby.
Setting up a premium proxy may help avoid blocks, but itโs often not enough for advanced anti-bot solutions. Only a web scraping API like ZenRows can bypass any detection system, allowing you to scrape websites at scale without getting blocked. Try ZenRows for free!