How to Use Selenium in NodeJS (Tutorial 2024)

July 16, 2024 · 8 min read

Are you looking for a comprehensive tutorial on using Selenium with NodeJS? You're in the right place!

Selenium is one of the most popular browser automation tools for web scraping and testing. Its official Node.js library, Selenium WebDriver, allows you to control web browsers programmatically.

In this tutorial, you'll learn everything you need to get started with using Selenium in NodeJS:

How to Use Selenium in NodeJS

You'll use the Selenium WebDriver's headless capabilities to scrape data from an infinite-scrolling webpage. You'll also parse the DOM and export the scraped data to a CSV file.

Let's get started!

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Step 1: Install Selenium in NodeJS

Before installing Selenium, ensure you've installed NodeJS on your system. Run the following command in the terminal to display the installed NodeJS version (e.g., v20.10.0):

Terminal
node -v

If the terminal displays an error message, install NodeJS from the official website first.

Once you're ready, create a selenium-nodejs-project directory and a JavaScript file (scraper.js) for your example project.

Terminal
mkdir selenium-nodejs-project
cd selenium-nodejs-project
touch scraper.js

Next, initialize your NodeJS project using the npm init command:

Terminal
npm init

The selenium-webdriver library is the Selenium WebDriver implementation for NodeJS. You can install it using the following command:

Terminal
npm install selenium-webdriver

Now, you can start writing your NodeJS script to scrape data using the Selenium WebDriver.

Open the project directory in your favorite IDE. In the scraper.js file, add the following line of code to import the library:

scraper.js
const { Builder } =  require('selenium-webdriver');

Awesome! It's time to start working on the code.

Step 2: Run Browser With Selenium in NodeJS

Selenium is widely known for its powerful browser automation capabilities. It supports most major browsers, including Chrome, Firefox, Edge, Opera, Safari, and Internet Explorer.

Since Chrome is the most popular and robust among these, you'll be using the same in this tutorial. Import the Chrome WebDriver in the scraper.js file:

scraper.js
const chrome = require('selenium-webdriver/chrome');

Create an async function to enclose the scraper logic since you'll be dealing with asynchronous operations.

scraper.js
// import statements

async function scraper() {
    // write your scraping logic here
}
 
scraper();

Next, initialize the headless Chrome browser. Headless browsers are web browsers without a graphical user interface (GUI). They allow you to interact with web pages and perform tasks programmatically.

scraper.js
async function scraper() {
   
    // set the browser options
    const options = new chrome.Options().addArguments('--headless');
 
    // initialize the webdriver
    const driver = new Builder().forBrowser('chrome').setChromeOptions(options).build();
 
 }

Now, you're ready to navigate to any webpage. In our case, we'll be scraping the product data from the ScrapingCourse infinite scrolling challenge page.

Demo Page
Click to open the image in full screen

It's good practice to use the try…catch…finally statement for efficient error handling. We'll use it to enclose our main scraping logic and handle errors/exceptions.

Navigate to the target webpage and get its complete HTML code.

scraper.js
    try {   
        // navigate to the target webpage
        await driver.get('https://www.scrapingcourse.com/infinite-scrolling');
  
        // extract HTML of target webpage
        const html = await driver.getPageSource();
        console.log(html);
  
    } catch (error) {  
        // handle error  
        console.error('An error occurred:', error);
    } finally {
        // quit browser session 
        await driver.quit();
    }

Here's how our scraper.js file looks right now:

scraper.js
// import statements
const { Builder } =  require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');

async function scraper() {
   
    // set the browser options
    const options = new chrome.Options().addArguments('--headless');
 
    // initialize the webdriver
    const driver = new Builder().forBrowser('chrome').setChromeOptions(options).build();
    
    try {   
        // navigate to the target webpage
        await driver.get('https://www.scrapingcourse.com/infinite-scrolling');
  
        // extract HTML of the target webpage
        const html = await driver.getPageSource();
        console.log(html);
  
    } catch (error) {  
        // handle error  
        console.error('An error occurred:', error);
    } finally {
        // quit browser session 
        await driver.quit();
    }
    
}
scraper();

The above script will print the following HTML in the terminal:

Output
<html lang="en"><head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>Infinite Scroll Challenge to Learn Web Scraping - ScrapingCourse.com</title>

<!-- Omitted for brevity... -->
</body></html>

Great! You've just extracted the complete HTML of the target webpage.

Step 3: Extract Data From a Page

Once you have the complete HTML of the webpage, you can proceed with extracting the required data. In this case, let's parse the name and price of all the products on the page.

To accomplish the task, you must follow these steps:

  1. Analyze the DOM of the webpage using DevTools.
  2. Implement an effective node selection strategy to locate the products.
  3. Extract the required data and store them in JavaScript arrays/objects.

DevTools is an invaluable tool in web scraping. It helps you inspect the currently loaded HTML, CSS, and JavaScript. You can also get information about the assets the page has requested and their corresponding loading time.

CSS selectors and XPath expressions are the most reliable node selection strategies. You can use either of them to locate the elements, but in this tutorial, we'll use CSS selectors for simplicity.

Let's use DevTools to define the CSS selectors. Open the target webpage in your browser, and right-click > Inspect on the product element to open DevTools.

Inspect Element
Click to open the image in full screen

You can observe that the individual product details are inside a div tag with the class name product-info. The product name is enclosed within the first span tag with the class name product-name, and the product price is within the second span tag with the class name product-price.

Use the above information to define the CSS selectors and locate the products using the findElements() and findElement() methods. Further, use the getText() method to extract the inner text of the HTML nodes and finally store the extracted names and prices in arrays.

scraper.js
const { Builder, By } =  require('selenium-webdriver');
// ...


        // ...
        // locate the parent elements
        let parentElements = await driver.findElements(By.css('.product-info'));

        const namesArray = [];
        const pricesArray = [];

        for (let parentElement of parentElements) {
            // find child elements within the parent element
            let names = await parentElement.findElement(By.css('.product-name'));
            let prices = await parentElement.findElement(By.css('.product-price'));

            namesArray.push(await names.getText());
            pricesArray.push(await prices.getText());
         
        }

        console.log(namesArray);
        console.log(pricesArray);
        // ...

Here's how your scraper.js file should look right now:

scraper.js
const { Builder, By } =  require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');

async function scraper() {
   
    // set the browser options
    const options = new chrome.Options().addArguments('--headless');
 
    // initialize the webdriver
    const driver = new Builder().forBrowser('chrome').setChromeOptions(options).build();
    
    try {
        
        // navigate to the target webpage
        await driver.get('https://www.scrapingcourse.com/infinite-scrolling');

        // locate the parent elements
        let parentElements = await driver.findElements(By.css('.product-info'));

        const namesArray = [];
        const pricesArray = [];

        for (let parentElement of parentElements) {
            // find child elements within the parent element
            let names = await parentElement.findElement(By.css('.product-name'));
            let prices = await parentElement.findElement(By.css('.product-price'));

            namesArray.push(await names.getText());
            pricesArray.push(await prices.getText());
         
        }

        console.log(namesArray);
        console.log(pricesArray);

    } catch (error) {
        // handle error
        console.error('An error occurred:', error);
    } finally {
        // quit browser session
        await driver.quit(); 
    }
 
}
 
scraper();

Run the above code, and you'll get the following output in the terminal:

Output
[
  'Chaz Kangeroo Hoodie',
  'Teton Pullover Hoodie',
  'Bruno Compete Hoodie',
  'Frankie Sweatshirt',
  'Hollister Backyard Sweatshirt',
  'Stark Fundamental Hoodie',
  'Hero Hoodie',
  'Oslo Trek Hoodie',
  'Abominable Hoodie',
  'Mach Street Sweatshirt',
  'Grayson Crewneck Sweatshirt',
  'Ajax Full-Zip Sweatshirt'
]
[
  '$52', '$70', '$63',
  '$60', '$52', '$42',
  '$54', '$42', '$69',
  '$62', '$64', '$69'
]

Voila! You just successfully scraped the product details using Selenium and NodeJS.

Step 4: Export Data to CSV

You're now ready to export the scraped data to a CSV file.

Import the built-in Node.js fs module, which provides functions for working with the file system.

scraper.js
const fs = require('fs');

Then, initialize a string variable called productsData with the header line containing column names ("name,price\n").

scraper.js
       let productsData = "name,price\n";

Next, loop through the two arrays (namesArray and pricesArray) containing product names and prices. For each element in the arrays, append a line to productsData with the name and price separated by a comma.

scraper.js
        for (let i = 0; i < namesArray.length; i++) {
            productsData += `${namesArray[i]},${pricesArray[i]}\n`;
        }

Using the fs.writeFile() function, write the productsData string to a file named ProductDetails.csv. This function takes three arguments: the file name, the data to write, and a callback function that handles any errors encountered during the writing process.

scraper.js
        fs.writeFile("ProductDetails.csv", productsData, err => {
            if (err) {
                console.error("Error:", err);
            } else {
                console.log("Success!");
            }
        });

Your final web scraping project code should look like the following:

scraper.js
const { Builder, By } =  require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');
const fs = require('fs');

async function scraper() {
   
    // set the browser options
    const options = new chrome.Options().addArguments('--headless');
 
    // initialize the webdriver
    const driver = new Builder().forBrowser('chrome').setChromeOptions(options).build();
    
    try {
        
        // navigate to the target webpage
        await driver.get('https://www.scrapingcourse.com/infinite-scrolling');

        // locate the parent elements
        let parentElements = await driver.findElements(By.css('.product-info'));

        let namesArray = [];
        let pricesArray = [];

        for (let parentElement of parentElements) {
            // find child elements within the parent element
            let names = await parentElement.findElement(By.css('.product-name'));
            let prices = await parentElement.findElement(By.css('.product-price'));

            namesArray.push(await names.getText());
            pricesArray.push(await prices.getText());      
        }

        console.log(namesArray);
        console.log(pricesArray);

        // export to csv file
        let productsData = "name,price\n";
        for (let i = 0; i < namesArray.length; i++) {
            productsData += `${namesArray[i]},${pricesArray[i]}\n`;
        }
        fs.writeFile("ProductDetails.csv", productsData, err => {
            if (err) {
                console.error("Error:", err);
            } else {
                console.log("Success!");
            }
        });


    } catch (error) {
        // handle error
        console.error('An error occurred:', error);
    } finally {
        // quit browser session
        await driver.quit(); 
    }
 
}
 
scraper();

Running the command node scraper.js in the terminal will create the ProductDetails.csv file. The CSV file will contain the following data:

Product CSV
Click to open the image in full screen

Amazing! You now have the fundamental knowledge required to use Selenium in NodeJS.

The target page has several products, but the current output displays only a few. This is because the page initially loads only the first 12 products and implements infinite scrolling to load the rest.

You'll learn how to scrape all the products in the next section.

Interacting With Web Pages in a Browser With Selenium-webdriver

When dealing with dynamic websites, you must interact with them like an average user would in a regular browser. Interactions on dynamic websites may include scrolling, clicking a button, filling out a form, moving the mouse, etc.

The selenium-webdriver Node.js library provides various browser interactions for automated testing and web scraping. Here are some of the key interactions supported by the library:

  • Click elements
  • Input text
  • Navigate to URLs
  • Navigate back and forward
  • Scrolling
  • Mouse actions
  • Keyboard actions
  • Wait for elements
  • Alert handling
  • Window handling
  • Frame and iFrame handling
  • Cookies handling
  • Configuring browser behavior

In addition to the built-in methods provided by the library to perform interactions, you can also use the executeScript() method. This method lets you execute a JavaScript code snippet directly on the page.

Let's finish our Node.js Selenium scraping project by extracting all the product data from the webpage. Then, we'll see some other interactions.

Scrolling

Since our target webpage implements infinite scrolling, you need to scroll to the bottom of the page until no new elements are loaded.

The following code repeatedly scrolls to the bottom of the page, waits for 3 seconds for content to load, and then checks if the page height has changed. If the height remains the same for two consecutive iterations, it assumes no more content is being loaded and breaks out of the loop.

scraper.js
        // loop to keep scrolling until no more content is loaded
        let lastHeight = 0;
        while (true) {
            // scroll to the end of the page
            await driver.executeScript('window.scrollTo(0, document.body.scrollHeight)');

            // wait for 3 seconds
            await driver.sleep(3000);

            // get the current height of the page
            const currentHeight = await driver.executeScript('return document.body.scrollHeight');

            // break the loop if no more content is loaded
            if (currentHeight === lastHeight) {
                break;
            }
            lastHeight = currentHeight;
        }

Integrate the above code snippet with the previous scraping script. Here's your new complete code:

File
const { Builder, By } =  require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');
const fs = require('fs');

async function scraper() {
   
    // set the browser options
    const options = new chrome.Options().addArguments('--headless');
 
    // initialize the webdriver
    const driver = new Builder().forBrowser('chrome').setChromeOptions(options).build();
    
    try {
        
        // navigate to the target webpage
        await driver.get('https://www.scrapingcourse.com/infinite-scrolling');

        // loop to keep scrolling until no more content is loaded
        let lastHeight = 0;
        while (true) {
            // scroll to the end of the page
            await driver.executeScript('window.scrollTo(0, document.body.scrollHeight)');

            // wait for 3 seconds
            await driver.sleep(3000);

            // get the current height of the page
            const currentHeight = await driver.executeScript('return document.body.scrollHeight');

            // break the loop if no more content is loaded
            if (currentHeight === lastHeight) {
                break;
            }
            lastHeight = currentHeight;
        }

        // locate the parent elements
        let parentElements = await driver.findElements(By.css('.product-info'));

        let namesArray = [];
        let pricesArray = [];

        for (let parentElement of parentElements) {
            // find child elements within the parent element
            let names = await parentElement.findElement(By.css('.product-name'));
            let prices = await parentElement.findElement(By.css('.product-price'));

            namesArray.push(await names.getText());
            pricesArray.push(await prices.getText());      
        }

        console.log(namesArray);
        console.log(pricesArray);

        // export to csv file
        let productsData = "name,price\n";
        for (let i = 0; i < namesArray.length; i++) {
            productsData += `${namesArray[i]},${pricesArray[i]}\n`;
        }
        fs.writeFile("ProductDetails.csv", productsData, err => {
            if (err) {
                console.error("Error:", err);
            } else {
                console.log("Success!");
            }
        });


    } catch (error) {
        // handle error
        console.error('An error occurred:', error);
    } finally {
        // quit browser session
        await driver.quit(); 
    }
 
}
 
scraper();

After executing this script, you'll have a ProductDetails.csv file containing details of all 187 items.

Updated CSV File
Click to open the image in full screen

Congratulations! You successfully scraped all the required data from the target webpage.

Wait for Element

In some cases like network or browser slowdown, your script might fail or show inconsistent results.

Rather than waiting for a fixed time interval, prefer smart waits, like waiting for a specific node to be present or visible on the page. This ensures that the web elements are loaded properly before interacting with them, reducing the chances of element not found or element not interactable errors.

The following code snippet implements an explicit wait strategy. The until.elementsLocated method defines the condition for waiting, which ensures that the WebDriver waits until the specified elements are located or until the maximum timeout of 5000 milliseconds (5 seconds) is reached.

scraper.js
const { Builder, By, until } =  require('selenium-webdriver');
// ...

// ...
        let parentElements = await driver.wait(until.elementsLocated(By.css('.your-css-selector')), 5000);

// ..

You can learn more about Selenium's Explicit Waits from the official documentation.

Wait for the Page to Load

Dynamic websites often have elements that load asynchronously or are added to the DOM after the initial page load. Your page load strategy should account for these dynamic elements to ensure you capture all the data you need while remaining efficient.

Selenium WebDriver allows you to set the page load strategy to control how WebDriver waits for page loads to complete. There are three possible strategies:

  • normal: WebDriver waits for the full page to load (including all its resources such as images, scripts, etc.) before considering the page load to be complete.
  • eager: WebDriver waits for the DOM access to be ready while other resources like images may still be loading.
  • none: WebDriver does not wait for the page to load at all. It's up to you to handle waiting for elements or other conditions manually.

You can check out the official Selenium documentation for more information about the page load strategy.

Avoid Getting Blocked When Scraping With Selenium

One of the biggest challenges to web scraping is getting blocked by websites implementing anti-bot measures. To avoid this, you need to imitate a real browser and normal user behavior.

Let's see what happens when we try to scrape data from G2 Reviews (a website protected by Cloudflare) using our script that extracts the complete HTML of the page.

scraper.js
// import statements
const { Builder } =  require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');

async function scraper() {
   
    // set the browser options
    const options = new chrome.Options().addArguments('--headless');
 
    // initialize the webdriver
    const driver = new Builder().forBrowser('chrome').setChromeOptions(options).build();
    
    try {   
        // navigate to the target webpage
        await driver.get('https://www.g2.com/products/airtable/reviews');
  
        // extract HTML of target webpage
        const html = await driver.getPageSource();
        console.log(html);
  
    } catch (error) {  
        // handle error  
        console.error('An error occurred:', error);
    } finally {
        // quit browser session 
        await driver.quit();
    }
    
}
scraper();

Running this script will produce the following output:

Output
<html class="no-js" lang="en-US">
    <title>Attention Required! | Cloudflare</title>

    <!-- omitted for brevity -->

    <h1 data-translate="block_headline">Sorry, you have been blocked</h1>
    <h2 class="cf-subheadline"><span data-translate="unable_to_access">You are unable to access</span> g2.com</h2>

    <!-- omitted for brevity -->

    <div class="cf-column">
    <h2 data-translate="blocked_why_headline">Why have I been blocked?</h2>

    <p data-translate="blocked_why_detail">This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.</p>
    </div>
    
    <!-- omitted for brevity -->
    
</html>

You got blocked! The website detected our scraping bot.

To avoid bot detection in Selenium, you can take measures like rotating IPs, premium proxies, rotating User-Agents, etc. Note that these approaches are just baby steps to bypass anti-bot solutions. Advanced anti-bot protection systems like Cloudflare would still be able to detect your bot.

So what can you do? Use ZenRows!

ZenRows is a popular alternative to Selenium in NodeJS. This advanced web scraping API offers all the functionalities of Selenium and provides additional features such as rotating premium proxies, auto-rotating UAs, anti-CAPTCHA, and other tools to help you avoid getting blocked.

To get started with ZenRows, sign up on the platform and get your 1,000 free API credits. After signing up, you'll get redirected to the Request Builder page.

building a scraper with zenrows
Click to open the image in full screen

Let's scrape data from the protected G2 Reviews page that you saw earlier.

Paste the target URL (https://www.g2.com/products/airtable/reviews) in the 'URL to Scrape' input field. Make sure the Premium Proxies checkbox is checked and JS rendering is enabled.

Click on the Try it button, and you'll get the following output:

Output
<!DOCTYPE html>
    <head>
        <meta charset="utf-8" />
        <link href="https://www.g2.com/assets/favicon-fdacc4208a68e8ae57a80bf869d155829f2400fa7dd128b9c9e60f07795c4915.ico" rel="shortcut icon" type="image/x-icon" />
        <title>Airtable Reviews 2024: Details, Pricing, &amp; Features | G2</title>
        <meta content="78D210F3223F3CF585EB2436D17C6943" name="msvalidate.01" />

        <!-- Omitted for Brevity -->

Congrats! You successfully scraped the HTML source of the protected G2 Reviews page.

As you saw, ZenRows web scraping API efficiently handles anti-bot measures.

It's even capable of replacing Selenium's full functionality. Selenium might seem like a free tool, but there are several hidden expenses to consider when using it for professional purposes. Learning time, troubleshooting complexity, scaling expenses, etc., make Selenium's net cost significantly higher than ZenRows.

Conclusion

In this Selenium NodeJS tutorial, you learned how to control headless Chrome for scraping and automation. You started with the basics and then moved on to the advanced concepts of web scraping using Selenium.

Now you know:

  • How to set up a NodeJS Selenium WebDriver project.
  • How to use it to scrape data from a dynamic website.
  • How to interact with dynamic content using Selenium.
  • The challenges of web scraping and how to deal with it.

No matter how good your scraping script is, anti-bot measures will still be able to block it. Avoid them all using the advanced ZenRows API. Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you