RSelenium for Web Scraping: Complete Tutorial [2024]

April 12, 2024 · 12 min read

Selenium is the most popular browser automation tool for testing and web scraping. The latter use case is especially useful in a data science-oriented language such as R. Selenium doesn't officially support R, but the community created the RSelenium port.

In this RSelenium tutorial, you'll see the basics and master more complex interactions.

Let's dive in!

Why Use RSelenium

Selenium is one of the most used headless browser libraries. That's due to its intuitive API, which supports multi-platform and multi-language browser automation. No wonder thousands of developers worldwide use it for testing and web scraping.

The project is so popular that the R community created a port called RSelenium. This library is open-source, and maintained by the community. This means it may not be always up-to-date. As of this writing, the latest release was in late 2022.

How to Use RSelenium

Follow the steps below and learn how to scrape data from an infinite-scrolling demo page. This page dynamically retrieves new products via JavaScript while you scroll down. As a web page that requires user interaction, it's great for using Selenium against it.

infinite scrolling demo page
Click to open the image in full screen

Time to learn the basics of Selenium in R!

Step 1: Install RSelenium

Before getting started, make sure you have R installed on your machine. Download the installer, follow the instructions, and add the R installation folder to your system's PATH.

You can now set up your R project. Create a folder called selenium-r-demo and access it in the terminal:

Terminal
mkdir selenium-r-demo
cd selenium-r-demo

Add a blank main.R file inside it, and load the project folder in your favorite R IDE. PyCharm or Visual Studio with the R extension are good choices.

To install RSelenium, you first need to start an R terminal. Do it with the command below:

Terminal
R

Or if you're a Windows user:

Terminal
R.exe

Launch this command in the R terminal to install the RSelenium library:

Terminal
install.packages("RSelenium")

You're ready to create your first RSelenium scraping script. Open main.R, and import the library:

main.r
require(RSelenium)

Next, you need to connect to a local Selenium server. The documentation recommends using Docker, but this approach no longer seems to work. That must be because the last release of RSelenium was in 2022, and browsers have evolved a lot since then.

Another approach involves using the rsDriver() function. This downloads and starts a Selenium server and the correct browser driver, respectively. Use it as follows:

main.r
selenium_server <- rsDriver(
  browser = "chrome",
  chromever = "latest"
)

Your main.R file should now contain:

main.r
require(RSelenium)

selenium_server <- rsDriver(
  browser = "chrome",
  chromever = "latest"
)

Execute it with the run button of your IDE or with this command:

Terminal
Rscript main.R

The script will log:

Output
checking Selenium Server versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking chromedriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking geckodriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking phantomjs versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD

This means that the library is downloading the required packages for you. If you're using a recent version of R, you'll also get the following warning:

Output
package 'RSelenium' was built under R version 4.2.3

Please, ignore that as it's not important.

Note that the script will fail with an error similar to this:

Output
[1] "Connecting to remote server"

Selenium message:session not created: This version of ChromeDriver only supports Chrome version 114
Current browser version is 121.0.6167.161 with binary path C:\Program Files\Google\Chrome\Application\chrome.exe
Build info: version: '4.0.0-alpha-2', revision: 'f148142cf8', time: '2019-07-01T21:30:10'
System info: host: '<USER>', ip: '10.5.0.2', os.name: 'Windows 11', os.arch: 'amd64', os.version: '10.0', java.version: '17.0.5'
Driver info: driver.version: unknown

The problem is that RSelenium hasn't been updated in a while. Thus, rsDriver() can only download Chrome drivers up to version 114.0.5735.90.

To fix that, go to the ChromeDriver download page, and get the driver matching your Chrome version. Open the following path:

Example
~/Library/Application Support/binman_chromedriver

Or if you are on Windows:

Example
C:\Users\<user>\AppData\Local\binman\binman_chromedriver\win32

Create a folder named like your version of Chrome, and unzip the download zip file inside it.

Click to open the image in full screen

In this case, the 121.0.6167.161 folder will now contain the chromedriver.exe file.

Then, run again the main.R file. If everything went as expected, it'll open a Chrome window and log something like this:

Output
[1] "Connecting to remote server"
$acceptInsecureCerts
[1] FALSE

$browserName
[1] "chrome"

$browserVersion
[1] "121.0.6167.161"

$chrome
$chrome$chromedriverVersion
[1] "121.0.6167.85 (3f98d690ad7e59242ef110144c757b2ac4eef1a2-refs/branch-heads/6167@{#1539})"

# omitted for brevity...

Awesome! The local Selenium server is now working correctly.

Access the WebDriver client object from the selenium_server connection this way:

main.r
driver <- selenium_server$client

Finally, don't forget to close the client and the server before the end of the script:

main.r
driver$close()
selenium_server$server$stop()

You initial main.R script will look like this:

main.r
require(RSelenium)

# start a selenium server and connect to Chrome
# using a local driver
selenium_server <- rsDriver(
  browser = "chrome",
  chromever = "latest"
)

# get the driver client object
driver <- selenium_server$client

# scraping logic...

# close the Selenium client and the server
driver$close()
selenium_server$server$stop()

Great! This RSelenium tutorial can now go deep into web scraping!

Step 2: Scrape With RSelenium

Use the navigate() method to open the desired page in a controlled Chrome instance:

main.r
driver$navigate("https://scrapingclub.com/exercise/list_infinite_scroll/")

Now, call the getPageSource() method to retrieve the source HTML of the page. Since this function returns a list, access the first element with [[1]]. Then, print the HTML in the terminal with print():

main.r
html <- driver$getPageSource()[[1]]
print(html)

Another aspect to consider is that you probably don't want to start Chrome with the GUI. That's especially true in production, where you want to use the headless mode to save resources. To configure Chrome in headless mode, use these lines:

main.r
# to run Chrome in headless mode
browser_capabilities <- list(
   # comment out for debugging
  chromeOptions =
    list(
      args = list("--headless")
    )
)

# start a selenium server and connect to Chrome
# using a local driver
selenium_server <- rsDriver(
  browser = "chrome",
  chromever = "latest",
  extraCapabilities = browser_capabilities
)

Here's what the current main.R will contain:

main.r
    require(RSelenium)
    
    # to run Chrome in headless mode
    browser_capabilities <- list(
      # comment out in testing
      chromeOptions =
        list(
          args = list("--headless")
        )
    )
    
    # start a selenium server and connect to Chrome
    # using a local driver
    selenium_server <- rsDriver(
      browser = "chrome",
      chromever = "latest",
      extraCapabilities = browser_capabilities
    )
    
    # get the driver client object
    driver <- selenium_server$client
    
    # navigate to the destination page
    driver$navigate("https://scrapingclub.com/exercise/list_infinite_scroll/")
    # extract the HTML source of the page and
    # log it
    html <- driver$getPageSource()[[1]]
    print(html)
    
    # close the Selenium client and the server
    driver$close()
    selenium_server$server$stop()

Configure Chrome to run in headed mode and execute the script:

Terminal
RScript main.R

The RSelenium R library will launch Chrome and open the Infinite Scrolling demo page:

Click to open the image in full screen

The R scraping script will also print the HTML associated with that page:

Output
[1] "<html class=\"h-full\"><head>\n    <meta charset=\"utf-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n    \n<meta name=\"description\" content=\"Learn to scrape infinite scrolling pages\"><title>Scraping Infinite Scrolling Pages (Ajax) | ScrapingClub</title>"
    # omitted for brevity...

Here we go! That's exactly the source HTML code of the destination page.

Step 3: Extract the Data You Want

RSelenium provides everything you need to extract data from an HTML page. Suppose your R scraper's goal is to retrieve each product's name and price. You can achieve that with these three operations:

  1. Select the product name cards on the page and extract their desired info.
  2. Select the product price elements on the page and extract their desired data.
  3. Convert the scraped data into a useful R data structure like a dataframe.

Selecting nodes requires an HTML node selection strategy, such as an XPath expression or CSS Selector. Selenium supports both, but CSS selectors are more intuitive than XPath expressions. For a head-to-head comparison, consult our guide on CSS Selector vs XPath.

Let's keep things simple and stick to CSS selectors. Analyze a product card's HTML code to determine how to define effective selectors. Visit the target page in the browser, right-click on a product, and inspect it with DevTools:

Click to open the image in full screen

Read the HTML code of the selected DOM element and notice that:

  • The product names are in .post h4 nodes.
  • The product price are in .post h5 elements.

Follow the instructions below and learn how to scrape that data from the page.

Use the findElements() method with the using = "css" option to apply a CSS selector and get the target nodes:

main.r
name_elements <- driver$findElements(using = "css", ".post h4")

Then, get the product names in a list by applying getElementText() to each element:

main.r
names <- unlist(lapply(name_elements, function(x) {
  x$getElementText()
}))

Similarly, do the same for prices:

main.r
price_elements <- driver$findElements(using = "css", ".post h5")
prices <- unlist(lapply(price_elements, function(x) {
  x$getElementText()
}))

Use the R data.frame() function to aggregate all scraped data into the products dataframe:

main.r
products <- products <- data.frame(
  names,
  prices
)

You can finally print it to verify that it contains the scraped data:

main.r
print(products)

Here's what main.R should now contain:

File
require(RSelenium)

# to run Chrome in headless mode
browser_capabilities <- list(
  # comment out in testing
  chromeOptions =
    list(
      args = list("--headless")
    )
)

# start a selenium server and connect to Chrome
# using a local driver
selenium_server <- rsDriver(
  browser = "chrome",
  chromever = "latest",
  extraCapabilities = browser_capabilities
)

# get the driver client object
driver <- selenium_server$client

# navigate to the destination page
driver$navigate("https://scrapingclub.com/exercise/list_infinite_scroll/")

# select the name elements inside the product cards
# and extract their names
name_elements <- driver$findElements(using = "css", ".post h4")
names <- unlist(lapply(name_elements, function(x) {
  x$getElementText()
}))

# select the price elements inside the product cards
# and extract their prices
price_elements <- driver$findElements(using = "css", ".post h5")
prices <- unlist(lapply(price_elements, function(x) {
  x$getElementText()
}))

# aggregate the scraped data into a data frame
products <- products <- data.frame(
  names,
  prices
)

# print the scraped products
print(products)

# close the Selenium client and the server
driver$close()
selenium_server$server$stop()

Run it, and it'll produce:

Output
                    names prices
1             Short Dress $24.99
2        Patterned Slacks $29.99
3     Short Chiffon Dress $49.99
4  Off-the-shoulder Dress $59.99
5              V-neck Top $24.99
6     Short Chiffon Dress $49.99
7              V-neck Top $24.99
8              V-neck Top $24.99
9        Short Lace Dress $59.99
10           Fitted Dress $34.99

Amazing! The R Selenium browser automation parsing logic works like a charm.

Step 4: Export Data as CSV

The final step involves exporting the scraped data to CSV. This is a good format to make the data easier to read, use, analyze, and share.

Before converting products into CSV format, change the dataframe column names using names(). That allows you to customize the header row of your CSV file:

main.r
names(products) <- c("name", "price")

Export the products dataframe to a CSV file called products.csv with the write.csv() method. To remove the index column of the dataframe in the CSV, set row.names to FALSE:

main.r
write.csv(
  products,
  file = "./products.csv",
  fileEncoding = "UTF-8",
  row.names = FALSE
)

Put it all together, and you'll get:

main.r
require(RSelenium)

# to run Chrome in headless mode
browser_capabilities <- list(
  # comment out in testing
  chromeOptions =
    list(
      args = list("--headless")
    )
)

# start a selenium server and connect to Chrome
# using a local driver
selenium_server <- rsDriver(
  browser = "chrome",
  chromever = "latest",
  extraCapabilities = browser_capabilities
)

# get the driver client object
driver <- selenium_server$client

# navigate to the destination page
driver$navigate("https://scrapingclub.com/exercise/list_infinite_scroll/")

# select the name elements inside the product cards
# and extract their names
name_elements <- driver$findElements(using = "css", ".post h4")
names <- unlist(lapply(name_elements, function(x) {
  x$getElementText()
}))

# select the price elements inside the product cards
# and extract their prices
price_elements <- driver$findElements(using = "css", ".post h5")
prices <- unlist(lapply(price_elements, function(x) {
  x$getElementText()
}))

# aggregate the scraped data into a data frame
products <- products <- data.frame(
  names,
  prices
)

# change the column names of the data frame
names(products) <- c("name", "price")
# export the data frame containing the scraped data to a CSV file
write.csv(
  products,
  file = "./products.csv",
  fileEncoding = "UTF-8",
  row.names = FALSE
)

# close the Selenium client and the server
driver$close()
selenium_server$server$stop()

Launch your R script:

Terminal
RScript main.R

Wait for the execution to complete, and a products.csv file will appear in the root folder of your project. Open it, and you'll see:

Click to open the image in full screen
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Wonderful! You just learned the basics of Selenium with R.

The problem is that the current output only involves the first ten products. The reason is simple. The destination page contains only a few products on the first load and retrieves more as the user scrolls down.

This RSelenium tutorial is far from over. Scrape all products in the next section!

Interact With a Browser With RSelenium

Selenium can simulate many web interactions, including waits, mouse movements, scrolls, etc. That's great for navigating pages as a human being would. The hidden benefit is that basic anti-bot systems will believe you're human and shouldn't block you.

Some of the most useful interactions supported by RSelenium are:

  • Clicks and other mouse actions.
  • Wait for elements to be on the page or to be visible.
  • Scroll up and down the page.
  • Fill out input fields in a form and submit it
  • Take screenshots.

Most of these operations are available via built-in methods. Alternatively, use the executeScript() method to run JavaScript code directly on the page. Thanks to both approaches, you can simulate any user interaction.

Learn how to scrape all products in the infinite scroll demo page, and then explore other valuable Selenium interactions!

Scrolling

The demo page initially has only ten products, relying on infinite scrolling to load more. To scrape all products, you thus need to mimic the scroll-down user interaction.

As Selenium doesn't offer a built-in method for scrolling, you'll need a JavaScript script. The snippet below instructs the browser to scroll down the page 10 times at an interval of 0.5 seconds each:

main.r
// scroll down the page 10 times
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === numScrolls) {
    clearInterval(scrollInterval)
  }
}, 500)

Store the above JavaScript script in a variable and run it on the page by passing it to executeScript():

main.r
scrolling_script <- "
    // scroll down the page 10 times
    const scrolls = 10
    let scrollCount = 0

    // scroll down and then wait for 0.5s
    const scrollInterval = setInterval(() => {
      window.scrollTo(0, document.body.scrollHeight)
      scrollCount++
  
      if (scrollCount === numScrolls) {
          clearInterval(scrollInterval)
      }
    }, 500)
";
driver$executeScript(scrolling_script)

Even though Selenium will now scroll down the page, you'll get the same result as before. That's because loading and rendering new products take time. As a first solution, wait for those operations to complete with an implicit wait:

main.r
Sys.sleep(10)

This will be your new RSelenium scroll down script:

main.r
require(RSelenium)

# to run Chrome in headless mode
browser_capabilities <- list(
  # comment out in testing
  chromeOptions =
    list(
      args = list("--headless")
    )
)

# start a selenium server and connect to Chrome
# using a local driver
selenium_server <- rsDriver(
  browser = "chrome",
  chromever = "latest",
  extraCapabilities = browser_capabilities
)

# get the driver client object
driver <- selenium_server$client

# navigate to the destination page
driver$navigate("https://scrapingclub.com/exercise/list_infinite_scroll/")

# simulate the infinite scrolling interaction
scrolling_script <- "
    // scroll down the page 10 times
    const scrolls = 10
    let scrollCount = 0

    // scroll down and then wait for 0.5s
    const scrollInterval = setInterval(() => {
      window.scrollTo(0, document.body.scrollHeight)
      scrollCount++

      if (scrollCount === numScrolls) {
          clearInterval(scrollInterval)
      }
    }, 500)
"
driver$executeScript(scrolling_script)

# wait for product to load
Sys.sleep(10)

# select the name elements inside the product cards
# and extract their names
name_elements <- driver$findElements(using = "css", ".post h4")
names <- unlist(lapply(name_elements, function(x) {
  x$getElementText()
}))

# select the price elements inside the product cards
# and extract their prices
price_elements <- driver$findElements(using = "css", ".post h5")
prices <- unlist(lapply(price_elements, function(x) {
  x$getElementText()
}))

# aggregate the scraped data into a data frame
products <- products <- data.frame(
  names,
  prices
)

# change the column names of the data frame
names(products) <- c("name", "price")
# export the data frame containing the scraped data to a CSV file
write.csv(
  products,
  file = "./products.csv",
  fileEncoding = "UTF-8",
  row.names = FALSE
)

# close the Selenium client and the server
driver$close()
selenium_server$server$stop()

Run the R script and wait for the scraping logic to complete:

Terminal
RScript main.R

This time, the scraper will take some time because of the 10-second wait. Open products.csv and you'll now see all 60 products:

Click to open the image in full screen

Terrific! You just scraped all the products on the page 🎊

Wait for Element

The current Selenium R script achieves the scraping goal defined at the beginning of the article. Yet, it's still not definitive.

The problem is that using implicit waits is discouraged. Why? They make your scraping logic flaky, as a simple slowdown will make the scraper fail!

Your scraping operation shouldn't depend on chance. Thus, use an explicit wait instead. Wait for the presence of the last product on the DOM before proceeding with the script execution. That's a best practice since it makes your logic more robust, reliable, and consistent.

Keep in mind that RSelenium doesn't provide the typical Selenium wait methods. So, use polling logic to simulate an explicit wait scenario:

main.r
for (x in 1:5) {
  tryCatch(
    {
      driver$findElement(using = "css", ".post:nth-child(60)")
      break
    },
    error = function(e) {
      # sleep for 2 seconds before retrying
      Sys.sleep(2)
    }
  )
}

Replace the sleep() instruction with these lines. The script will now wait for the AJAX calls triggered by the scrolls to change the DOM until it has all the products.

This is the code of the definitive RSelenium scroll down script:

main.r
require(RSelenium)

# to run Chrome in headless mode
browser_capabilities <- list(
  # comment out in testing
  chromeOptions =
    list(
      args = list("--headless")
    )
)

# start a selenium server and connect to Chrome
# using a local driver
selenium_server <- rsDriver(
  browser = "chrome",
  chromever = "latest",
  extraCapabilities = browser_capabilities
)

# get the driver client object
driver <- selenium_server$client

# navigate to the destination page
driver$navigate("https://scrapingclub.com/exercise/list_infinite_scroll/")

# simulate the infinite scrolling interaction
scrolling_script <- "
    // scroll down the page 10 times
    const scrolls = 10
    let scrollCount = 0

    // scroll down and then wait for 0.5s
    const scrollInterval = setInterval(() => {
      window.scrollTo(0, document.body.scrollHeight)
      scrollCount++

      if (scrollCount === numScrolls) {
          clearInterval(scrollInterval)
      }
    }, 500)
"
driver$executeScript(scrolling_script)

# wait for the 60th to be on the page
# performing up to 5 attemps
for (x in 1:5) {
  tryCatch(
    {
      driver$findElement(using = "css", ".post:nth-child(60)")
      break
    },
    error = function(e) {
      # sleep for 2 seconds before retrying
      Sys.sleep(2)
    }
  )
}

# select the name elements inside the product cards
# and extract their names
name_elements <- driver$findElements(using = "css", ".post h4")
names <- unlist(lapply(name_elements, function(x) {
  x$getElementText()
}))

# select the price elements inside the product cards
# and extract their prices
price_elements <- driver$findElements(using = "css", ".post h5")
prices <- unlist(lapply(price_elements, function(x) {
  x$getElementText()
}))

# aggregate the scraped data into a data frame
products <- products <- data.frame(
  names,
  prices
)

# change the column names of the data frame
names(products) <- c("name", "price")
# export the data frame containing the scraped data to a CSV file
write.csv(
  products,
  file = "./products.csv",
  fileEncoding = "UTF-8",
  row.names = FALSE
)

# close the Selenium client and the server
driver$close()
selenium_server$server$stop()

Execute the script another time, and it'll produce the same CSV file but much faster. The reason is that the scraper will now wait for the right amount of time only.

Now you achieve the result like a pro, it's time to learn even more.

Wait for Page to Load

driver$navigate automatically waits for the browser to fire the page load event. While that's useful in most scenarios, it may not be enough when dealing with pages that retrieve data via AJAX.

As mentioned before, RSelenium doesn't expose methods for waiting. So, you can either use custom polling logic as done before or setImplicitWaitTimeout(). This sets the amount of time the driver must wait while searching for nodes before failing:

main.r
driver$setImplicitWaitTimeout(milliseconds = 10000) # wait up to 10 seconds

Click Elements

After selecting an element, you can call the clickElement() method to simulate a click. Use it as in the RSelenium click button example:

main.r
button_element <- driver$findElement(using = "css", "button")
button_element$clickElement()

The browser will send a mouse click event on the element and trigger the HTML onclick() callback.

Hooray! You're now the master Selenium with R.

Avoid Getting Blocked When Scraping With RSelenium

The biggest challenges to your RSelenium scraping operation are anti-bot solutions. These technologies monitor incoming requests and can detect and stop automated ones. Simply put, they will block your R script!

How is this possible? If your scraper doesn't use the proper headers or makes too many requests in a short time, it'll raise the attention of anti-bots. When this happens, and you don't have a way to mask your identity, you'll get blocked!

A quick solution is to set a real-world User-Agent with the --user-agent flag and a proxy with --proxy-server. The first option will make your requests as coming from an original user browser. The second will hide your IP. Learn more in our guide on Selenium user agent.

Here‘s how to set a User-Agent and proxy in the RSelenium:

main.r
user_agent <- "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36"
    proxy_url <- "<YOUR_PROXY_URL>"
    
    browser_capabilities <- list(
      chromeOptions =
        list(
          args = list(
            "--headless",
            sprintf("--proxy-server=%s", proxy_url)[[1]],
            sprintf("--user-agent=%s", user_agent)[[1]]
          )
        )
    )

These two flags might help, but web scraping without getting blocked takes more than that. Sophisticated solutions like Cloudflare can still detect the automated nature of your requests. Try to challenge it, and you'll receive a CAPTCHA as below:

Click to open the image in full screen

Should give up? Of course not. You only name the right tool, and its name is ZenRows! As an all-in-one scraping API, it seamlessly integrates with the RSelenium and extends it with anti-bot bypass capabilities, IP and User-Agent rotation functionality, and much more.

Thanks to the ZenRows support for headless browser rendering, you can even replace Selenium with a simple R HTTP client.

Integrate your R script with ZenRows and equip it with a legendary weapon. Sign up for free to receive 1,000 credits and access the Request Builder page:

ZenRows Request Builder
Click to open the image in full screen

Suppose you want to scrape data from the G2.com page mentioned earlier that is protected by Cloudflare.

Paste your target URL (https://www.g2.com/products/airtable/reviews) into the "URL to Scrape" input. Check the "Premium Proxy" option for rotating IPs and make sure the "JS Rendering" feature isn't enabled (since you don't want both ZenRows and Selenium to render the page).

Select the “cURL” option on the right to retrieve the ZenRows endpoint URL you can call in any scraping tool. Copy the generated URL and pass it to the navigate() method:

main.r
require(RSelenium)

# to run Chrome in headless mode
browser_capabilities <- list(
  # comment out in testing
  chromeOptions =
    list(
      args = list("--headless")
    )
)

# start a selenium server and connect to Chrome
# using a local driver
selenium_server <- rsDriver(
  browser = "chrome",
  chromever = "latest",
  extraCapabilities = browser_capabilities
)

# get the driver client object
driver <- selenium_server$client

# navigate to the destination page
driver$navigate("https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fairtable%2Freviews&premium_proxy=true")
# extract the HTML source of the page and
# log it
html <- driver$getPageSource()[[1]]
print(html)

# close the Selenium client and the server
driver$close()
selenium_server$server$stop()

Execute the script, and it'll print the source HTML of the G2.com page:

Output
[1] "<!DOCTYPE html>\n    <head>\n <meta charset=\"utf-8\" />\n  <link href=\"https://www.g2.com/assets/favicon-fdacc4208a68e8ae57a80bf869d155829f2400fa7dd128b9c9e60f07795c4915.ico\" rel=\"shortcut icon\" type=\"image/x-icon\" />\n  <title>Airtable Reviews 2024: Details, Pricing, &amp; Features | G2</title>"
    ### omitted for brevity ...

CAPTCHAS are gone forever. 🤯 Mind-blowing 🤯

You just integrated ZenRows into your R script. This RSelenium tutorial is over!

Before saying "goodbye," what about all the other anti-bot traps? Interacting with the page in Selenium might trigger them! The great news is that you can replace Selenium entirely with ZenRows. That will also save you money, considering the cost of Selenium.

Conclusion

In this RSelenium scraping tutorial, you mastered the basics and then dug into advanced techniques. You're now an R Selenium ninja!

Here, you learned how to use Selenium in an R project and explored how to use it to get data from a dynamic content page. You saw what interactions you can simulate and how to face the challenges of web scraping.

The problem? Anti-bot technologies can block most browser automation scripts. Bypass them all with ZenRows, a web scraping API the most powerful anti-scraping toolkit available, IP rotation, and browser automation capabilities. Scraping dynamic content sites has never been easier.

Frequent Questions

What is RSelenium Used for?

The RSelenium R library is mainly used for web automation. In detail, it enables you to create scripts that can control a browser for testing and web scraping. It offers an intuitive API to simulate user interactions, such as clicking buttons, filling out forms, and navigating pages.

Can We Use R With Selenium?

Yes, we can use Selenium with R. Selenium doesn't have official bindings for R, but the community has come to the rescue with RSelenium. This package brings the popular web automation tool to the R programming language.

Does Selenium Support R?

Yes, Selenium supports R. The official Selenium WebDriver project doesn't come with an R library, but you can find community-driven R bindings. This means that also R developers can write browser automation scripts in Selenium.

Ready to get started?

Up to 1,000 URLs for free are waiting for you