How to Set Up a Proxy With Watir

July 22, 2024 ยท 5 min read

Watir (Web Application Testing in Ruby) is a Selenium-powered, open-source family of Ruby libraries for automating web browsers.

While effective for web scraping in Ruby, it can still get blocked by websites with anti-bot measures.

In this tutorial, you'll learn how to set up Watir proxies to avoid detection and bans, and scrape the web interrupted. Letโ€™s go!

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Set up a Proxy With Watir

To get started, install the Watir gem:

Terminal
gem install watir

Next, import the gem in your script. Initialize a new Chrome browser instance in headless mode and navigate to HTTPBin, a website that returns the client's IP address. Finally, retrieve the page content and close the browser:

scraper.rb
require 'watir'

# initialize the browser
browser = Watir::Browser.new :chrome, headless: true

# navigate to the URL
url = 'https://httpbin.io/ip'
browser.goto(url)

# get the page content
page_content = browser.text
puts page_content

# close the browser
browser.close

The above code will print your machine's IP address:

Output
{
  "origin": "210.212.39.138:80"
}

Using this script for requests reveals your IP address, which is bad practice for web scraping. Since most websites monitor traffic, their anti-bot systems may detect and block your IP.

To mask your request, let's integrate proxies into the code.

You can grab a free proxy from the Free Proxy List website. Make sure to pick the HTTPS proxy, which works with both HTTPS and HTTP websites.

Define the proxy settings by replacing 8.219.97.248:80 with your actual proxy server address and port. Here, the same proxy is used for both HTTP and SSL connections.

scraper.rb
proxy = {
  http: '8.219.97.248:80',
  ssl:  '8.219.97.248:80'
}

Now, initialize the Chrome browser instance in headless mode, but this time with the specified proxy settings.

scraper.rb
# ...

browser = Watir::Browser.new :chrome, headless: true, proxy: proxy

# ...

After merging the snippets, your complete code should look like this:

scraper.rb
require 'watir'

# define proxy
proxy = {
  http: '8.219.97.248:80',
  ssl:  '8.219.97.248:80'
}

# initialize the browser
browser = Watir::Browser.new :chrome, headless: true, proxy: proxy

# navigate to the URL
url = 'http://httpbin.io/ip'
browser.goto(url)

# get the page content
page_content = browser.text
puts page_content

# close the browser
browser.close

This script will print the IP address of the proxy server to the console:

Output
{
  "origin": "8.219.97.248:80"
}

Congrats! The response matches the proxy server IP.

You now know the basics of using a proxy with Watir. Let's dive into more advanced concepts!

Add Rotating and Premium Proxies to Watir

If you make several requests from a specific IP address, your activity becomes easy to detect. Rotating proxies can help you distribute your requests across multiple IP addresses, making it harder for websites to block your scraper.

Let's integrate rotating proxies into your Watir script. You'll build a simple rotator that randomly selects a proxy from a predefined list for each browsing session.

First, grab some free proxies from the Free Proxy List website. Then, configure the Selenium WebDriver logger to log important messages and ignore unnecessary ones to reduce log noise.

scraper.rb
require 'watir'
require 'logger'

# list of proxies
proxies = [
  { http: '8.219.97.248:80', ssl: '8.219.97.248:80' },
  { http: '20.235.159.154:80', ssl: '20.235.159.154:80' },
  { http: '18.188.32.159:3128', ssl: '18.188.32.159:3128' },
  # ...
]

# configure Selenium WebDriver logger
logger = Selenium::WebDriver.logger
logger.ignore(:jwp_caps, :logger_info)

Define a function that randomly selects a proxy from the proxies list and returns it.

scraper.rb
# ...

# function to rotate proxies
def get_rotating_proxy(proxies)
  proxies.sample
end

Use the get_rotating_proxy() function to randomly select a proxy. Then, log the selected proxy for reference. As before, initialize a headless Chrome browser with the chosen proxy. Finally, navigate to the target website to retrieve the page content.

scraper.rb
# ...

begin
  # initialize the browser with a proxy
  proxy = get_rotating_proxy(proxies)
  logger.info("Using proxy: #{proxy}")
  browser = Watir::Browser.new :chrome, headless: true, proxy: proxy

  # navigate to the URL
  url = 'https://httpbin.io/ip'
  browser.goto(url)

  # get the page content
  page_content = browser.text
  puts page_content

rescue => e
  # handle error
  logger.error("An error occurred: #{e.message}")
ensure
  # close the browser
  browser.close
end

If any error occurs during execution, the rescue block will catch and log them. The ensure block guarantees that the browser is closed properly, regardless of whether an error occurred.

This structure (begin, rescue, ensure, and end) is crucial to ensure your script is robust, handles errors, and performs necessary cleanup operations.

Here's the complete code after merging all the above snippets:

scraper.rb
require 'watir'
require 'logger'

# list of proxies
proxies = [
  { http: '8.219.97.248:80', ssl: '8.219.97.248:80' },
  { http: '20.235.159.154:80', ssl: '20.235.159.154:80' },
  { http: '18.188.32.159:3128', ssl: '18.188.32.159:3128' },
  # ...
]

# configure Selenium WebDriver logger
logger = Selenium::WebDriver.logger
logger.ignore(:jwp_caps, :logger_info)

# function to rotate proxies
def get_rotating_proxy(proxies)
  proxies.sample
end

begin
  # initialize the browser with a proxy
  proxy = get_rotating_proxy(proxies)
  logger.info("Using proxy: #{proxy}")
  browser = Watir::Browser.new :chrome, headless: true, proxy: proxy

  # navigate to the URL
  url = 'https://httpbin.io/ip'
  browser.goto(url)

  # get the page content
  page_content = browser.text
  puts page_content

rescue => e
  # handle error
  logger.error("An error occurred: #{e.message}")
ensure
  # close the browser
  browser.close
end

Youโ€™ll get a randomly selected proxy as output every time you run this code.

Here's the output after running the code three times:

Output
# request 1
2024-05-21 20:23:56 INFO Selenium Using proxy: {:http=>"8.219.97.248:80", :ssl=>"8.219.97.248:80"}
{
  "origin": "8.219.97.248:80"
}

# request 2
2024-05-21 20:24:08 INFO Selenium Using proxy: {:http=>"18.188.32.159:3128", :ssl=>"18.188.32.159:3128"}
{
  "origin": "18.188.32.159:3128"
}

# request 3
2024-05-21 20:25:45 INFO Selenium Using proxy: {:http=>"20.235.159.154:80", :ssl=>"20.235.159.154:80"}
{
  "origin": "20.235.159.154:80"
}

Fantastic! You successfully implemented the rotating proxies approach.

As mentioned before, the free proxies may not be consistently reliable. They have a short lifespan and tend to be slow.

Another issue is that free proxies fail when facing advanced anti-bot measures. Try to test the proxy rotator logic against the G2 Reviews page that uses anti-bot technologies:

scraper.rb
require 'watir'
require 'logger'

# list of proxies
proxies = [
  { http: '8.219.97.248:80', ssl: '8.219.97.248:80' },
  { http: '20.235.159.154:80', ssl: '20.235.159.154:80' },
  { http: '18.188.32.159:3128', ssl: '18.188.32.159:3128' },
  # ...
]

# configure Selenium WebDriver logger
logger = Selenium::WebDriver.logger
logger.ignore(:jwp_caps, :logger_info)

# function to rotate proxies
def get_rotating_proxy(proxies)
  proxies.sample
end

begin
  # initialize the browser with a proxy
  proxy = get_rotating_proxy(proxies)
  logger.info("Using proxy: #{proxy}")
  browser = Watir::Browser.new :chrome, headless: true, proxy: proxy

  # navigate to the URL
  url = 'https://www.g2.com/products/asana/reviews'
  browser.goto(url)

  # get the page content
  page_content = browser.text
  puts page_content


rescue => e
  # handle error
  logger.error("An error occurred: #{e.message}")
ensure
  # close the browser
  browser.close
end

Hereโ€™s the output:

Output
2024-05-21 20:45:12 INFO Selenium Using proxy: {:http=>"8.219.97.248:80", :ssl=>"8.219.97.248:80"}

Sorry, you have been blocked
You are unable to access g2.com
Why have I been blocked?
This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.
What can I do to resolve this?
You can email the site owner to let them know you were blocked. Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.
Cloudflare Ray ID: jk34g5k3g5523 โ€ข Your IP: Click to reveal โ€ข Performance & security by Cloudflare 

Your request got blocked by Cloudflare!

To stay undetected by more advanced protection systems, you need premium proxies. They're consistently reliable, provide a more automated process (no need for manually composing the list), and can bypass all anti-bots and IP bans. If youโ€™re unsure where to get started, check out our list of the best premium proxy providers.

Let's learn how to use premium proxies using the example of ZenRows, the most reliable premium proxy provider.

Sign up for free, and you'll get redirected to the Request Builder page.

Paste the same G2 Reviews URL in the URL to Scrape box. Enable JS Rendering and click on the Premium Proxies check box. Select Ruby as your language and click on the API tab to copy the API endpoint.

building a scraper with zenrows
Click to open the image in full screen

Let's jump into the code!

Initialize the browser in headless mode. Then, navigate to the G2 Reviews page specified by the ZenRows API endpoint. Open the page, extract and print its HTML, and terminate the browser session to conclude the scraping process.

Here's the final Watir script integrating the ZenRows premium proxies:

scraper.rb
require 'watir'

# initialize the browser in headless mode
browser = Watir::Browser.new :chrome, headless: true

# connect to the target page
url = 'https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fasana%2Freviews&js_render=true&premium_proxy=true'
browser.goto(url)

# get the page content
page_content = browser.html
puts page_content

# close the browser
browser.quit

The code accesses the protected website and extracts its HTML:

Output
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
    <title>Asana Reviews, Pros + Cons, and Top Rated Features</title>
</head>
<body>
    <!-- other content omitted for brevity -->
</body>
</html>

Great! You've just bypassed a protected website with ZenRows.

Conclusion

This tutorial walked you through the whole process of configuring proxies in Watir. Now, you know how to:

  • Use proxies with Watir.
  • Set up a rotating proxy.
  • Use premium proxies.

Premium proxies boost your scraperโ€™s reliability, save you the hassle of finding and configuring proxies manually, and provide a foolproof anti-bot bypass that works even against the most powerful protection systems. Try them out with ZenRows, a complete web scraping toolkit. On top of premium proxies, ZenRows offers a headless browser, User Agent rotator, and anything else you need to extract data from the web.

Ready to get started?

Up to 1,000 URLs for free are waiting for you