How to Use Nodriver for Web Scraping

June 26, 2024 · 10 min read

Are you getting blocked by anti-bots while web scraping with Python? Nodriver, an Undetected ChromeDriver fork, can help you bypass them.

In this article, you'll learn how nodriver works, how to use it for web scraping in Python, and how to help your scraper avoid blocks and bans.

Let's go!

Why Use Nodriver for Web Scraping?

Nodriver is a derivative of the Undetected ChromeDriver. It's a Python library developed to bypass CAPTCHAs and Web Application Firewalls (WAFs) like Cloudflare and Imperva during web scraping.

What distinguishes nodriver from the Undetected ChromeDriver is that it doesn't depend on Selenium or its ChromeDriver binary. Instead, it uses a real browser to automate web actions, which lets it successfully evade anti-bot detection and puts it a step ahead of the vanilla Selenium library.

What's more, nodriver enables you to scrape websites across different browser tabs simultaneously, and its asynchronous execution function makes it possible to scrape multiple pages concurrently. You also get cookie support to keep login sessions and extract data hidden behind a login. And since Nodriver supports the Chrome DevTool Protocol (CDP), you can execute JavaScript and locate elements by text or CSS selectors.

Now, let's test all of these features in practice!

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Tutorial: How to Build a Scraper With Nodriver

To learn how to use nodriver, you'll scrape product information from ScrapingCourse.com, a demo website with e-commerce features. You'll start with full-page HTML extraction. Then, you'll scrape specific product information before exporting it to a CSV file.

Here's what the target website looks like:

Scrapingcourse Ecommerce Store
Click to open the image in full screen

Let's start with the prerequisites.

Prerequisites

This tutorial uses Python 3.12+. Ensure you download and install the latest Python version if you haven't already.

You'll also need to install the nodriver library using pip:

Terminal
pip install nodriver

This tutorial uses VS Code as the preferred integrated Development Environment (IDE), but feel free to code along with any IDE of your choice.

Step 1: Get the Page's HTML

To start, let's extract the website's full-page HTML.This is the most basic way to use nodriver and testwhetherf it can access the target web page.

Nodriver is asynchronous by default, so you must mport Python's asyncio with the nodriver library. Spin a driver instance, visit the e-commerce demo page, and print its content:

scraper.py
# import the required libraries
import nodriver as uc
import asyncio

async def scraper():

    # start a new Chrome instance
    driver = await uc.start()

    # visit the target website
    page = await driver.get("https://www.scrapingcourse.com/ecommerce/")

    # get the full-page HTML
    html_content = await page.get_content()
    print(html_content)

    # close the page
    await page.close()

# run the scraper function with asyncio
if __name__ == "__main__":
    asyncio.run(scraper())

The code above outputs the website's full-page HTML as expected. The results below show the page title, with some omitted content for brevity:

Output
<!DOCTYPE html>
<html lang="en-US">
<head>
    <!--- ... --->
 
    <title>Ecommerce Test Site to Learn Web Scraping - ScrapingCourse.com</title>
   
  <!--- ... --->
</head>
<body class="home archive ...">
    <p class="woocommerce-result-count">Showing 1-16 of 188 results</p>
    <ul class="products columns-4">
       
        <!--- ... --->

    </ul>
</body>
</html>

Your scraper works! Now, let's target specific product data.

Step 2: Extract Product Data

Let's extract each product from its parent element using the nodriver CSS selector, starting with the names and prices of the products on the first page.

First, inspect the web page to expose its attributes. Right-click a product and select “Inspect” to open the inspection tool. You'll see that each product is in a list (li) tag:

Scrapingcourse Ecommerce Homepage Inspect First Page
Click to open the image in full screen

Modify the previous code to extract all the list elements from the product page. Then, use Python's for loop to iterate through each container and write names and prices into a product data array using nodriver's query selector:

scraper.py
async def scraper():

    # ...

    # extract all the product containers
    products = await page.select_all(".product")

    # wait for the content to load
    page.sleep(15)

    # product array to collect data
    product_data = []

    # loop through each container to extract names and prices
    for product in products:
        product_name = await product.query_selector(".woocommerce-loop-product__title")

        product_price = await product.query_selector(".price")

        # get all product texts into a dictionary
        data = {
            "Name": product_name.text_all,
            "Price": product_price.text_all
        }

        # append each product data to the product data array
        product_data.append(data)

    print(product_data)
    ```

Combine the code above with the previous snippet, and you'll get the following:

scraper.py
# import the required libraries
import nodriver as uc
import asyncio

async def scraper():

    # start a new Chrome instance
    driver = await uc.start()

    # visit the target website
    page = await driver.get("https://www.scrapingcourse.com/ecommerce/")

    # extract all the product containers
    products = await page.select_all(".product")

    # wait for the content to load
    page.sleep(15)

    # product array to collect data
    product_data = []

    # loop through each container to extract names and prices
    for product in products:
        product_name = await product.query_selector(".woocommerce-loop-product__title")

        product_price = await product.query_selector(".price")

        # get all product texts into a dictionary
        data = {
            "Name": product_name.text_all,
            "Price": product_price.text_all
        }

        # append each product data to the product data array
        product_data.append(data)

    print(product_data)

    # close the page
    await page.close()

# run the scraper function with asyncio
if __name__ == "__main__":
    asyncio.run(scraper())

The scraper extracts the names and prices of all products:

Output
[
        {'Name': 'Abominable Hoodie', 'Price': '$ 69.00'},
        {'Name': 'Adrienne Trek Jacket', 'Price': '$ 57.00'},

        # other products omitted for brevity
         
        {'Name': 'Ariel Roll Sleeve Sweatshirt', 'Price': '$ 39.00'},
        {'Name': 'Artemis Running Short', 'Price': '$ 45.00'}
    ]

You've just extracted specific data from a single page page using nodriver. Let's take it further by exporting the scraped content to CSV.

Step 3: Export the Data as a CSV File

It's time to save your extracted data to a CSV file for further use. First, import the built-in csv module into the previous code. Then, add each product to a new row and save the CSV into your project directory:

scraper.py
# ... 
import  csv

async def scraper():

    # ...

    # save the data to a CSV file
    keys = product_data[0].keys()
    with open("product_data.csv", "w", newline="", encoding="utf-8") as output_file:
        dict_writer = csv.DictWriter(output_file, fieldnames=keys)
        dict_writer.writeheader()
        dict_writer.writerows(product_data)
        print("CSV created successfully")

Combine the snippets, and you'll get the following full code:

scraper.py
# import the required libraries
import nodriver as uc
import asyncio
import csv

async def scraper():

    # start a new Chrome instance
    driver = await uc.start()

    # visit the target website
    page = await driver.get("https://www.scrapingcourse.com/ecommerce/")

    # extract all the product containers
    products = await page.select_all(".product")

    # wait for the content to load
    page.sleep(15)

    # product array to collect data
    product_data = []

    # loop through each container to extract names and prices
    for product in products:
        product_name = await product.query_selector(".woocommerce-loop-product__title")

        product_price = await product.query_selector(".price")

        # get all product texts into a dictionary
        data = {
            "Name": product_name.text_all,
            "Price": product_price.text_all
        }

        # append each product data to the product data array
        product_data.append(data)

    # save the data to a CSV file
    keys = product_data[0].keys()
    with open("product_data.csv", "w", newline="", encoding="utf-8") as output_file:
        dict_writer = csv.DictWriter(output_file, fieldnames=keys)
        dict_writer.writeheader()
        dict_writer.writerows(product_data)
        print("CSV created successfully")

    # close the page
    await page.close()

# run the scraper function with asyncio
if __name__ == "__main__":
    asyncio.run(scraper())

The code generates the following CSV file:

Extracted Data in CSV  File
Click to open the image in full screen

Your scraper now saves the extracted data to a CSV.

However, the target website has more than one page. What's more, the site uses infinite scrolling, and the basic scraper will only retrieve the initially loaded data.

In the next section, you'll learn how to scrape paginated websites and dynamic web pages.

Advanced Web Scraping With Nodriver

In this section, you'll learn how to execute advanced data extraction tasks with nodriver, including scraping paginated websites and dynamic content like infinite scrolling.

Scrape Multiple Pages

The previous target website uses pagination to separate content into several pages. However, to scrape the whole website, you need to visit each page iteratively to scrape its data.

Right-click the "next" button on the target website's navigation bar and select Inspect to view its element:

Scrapingcourse Navbar Inspection
Click to open the image in full screen

To implement pagination with nodriver, open a while loop where you'll execute your scraping and navigation logic.

Extract the next page element (.next) and keep clicking until it's no longer in the DOM. Ensure you wait for more elements to load before scraping. Modify the previous scraper function like this:

scraper.py
async def scraper():

# ...

    while True:
        
        # ... scraping logic

        # find the "Next" button
        next_page_element = await page.query_selector(".next.page-numbers")

        if next_page_element:
            await next_page_element.click()
            # wait for the content to load
            await page.sleep(10)
        else:
            break

    # ... code to save data to a CSV file

Here's the final code after modification:

scraper.py
# import the required libraries
import nodriver as uc
import asyncio
import csv

async def scraper():
    # start a new Chrome instance
    driver = await uc.start()

    # visit the target website
    page = await driver.get("https://www.scrapingcourse.com/ecommerce/")

    page.sleep(10)

    # product array to collect data
    product_data = []

    while True:
        # extract all the product containers
        products = await page.select_all(".product")

        # loop through each container to extract names and prices
        for product in products:
            product_name = await product.query_selector(".woocommerce-loop-product__title")
            product_price = await product.query_selector(".price")

            # get all product texts into a dictionary
            data = {
                "Name": product_name.text_all,
                "Price": product_price.text_all
            }

            # append each product data to the product data array
            product_data.append(data)

        # find the "Next" button
        next_page_element = await page.query_selector(".next.page-numbers")

        if next_page_element:
            await next_page_element.click()
            # wait for the content to load
            await page.sleep(10)
        else:
            break

    # save the data to a CSV file
    keys = product_data[0].keys()
    with open("product_data.csv", "w", newline="", encoding="utf-8") as output_file:
        dict_writer = csv.DictWriter(output_file, fieldnames=keys)
        dict_writer.writeheader()
        dict_writer.writerows(product_data)
        print("CSV created successfully")

    # close the page
    await page.close()

# run the scraper function with asyncio
if __name__ == "__main__":
    asyncio.run(scraper())

The code above extracts product names and prices from all pages and exports them to a CSV file in your project directory.

Open the file, and you'll see that it now contains product information from the whole website:

Updated CSV File
Click to open the image in full screen

Congratulations! You've just implemented pagination to scrape an entire website with the nodriver library.

Use Nodriver's JavaScript Rendering Capabilities

Nodriver can execute JavaScript, so it's useful for extracting data from dynamic websites such as those using infinite scrolling.

Let's see how this feature works by scraping product names and prices from the ScrapingCourse infinite scrolling challenge page. The target website loads more content as you scroll down.

See its layout below:

Infinite Scrolling Page
Click to open the image in full screen

You'll simulate that scrolling effect to extract data continuously using nodriver.

Let's inspect the first product's container with its name and price. Each product is in a div element, as shown:

Infinite Scroll Demo Inspection
Click to open the image in full screen

First, define your scraping logic in an asynchronous scraper function that uses a for loop. The goal is to iterate through each product container (.product-item) to extract its name and price. This function accepts a page argument, which is the page instance. It also requires the product data argument, representing the array with the extracted data:

scraper.py
# import the required libraries
import nodriver as uc
import asyncio

async def scraper(page, product_data):
    # extract all the product containers
    products = await page.select_all(".product-item")

    # loop through each container to extract names and prices
    for product in products:
        product_name = await product.query_selector(".product-name")
        product_price = await product.query_selector(".product-price")

        # get all product texts into a dictionary
        data = {
            "Name": product_name.text_all,
            "Price": product_price.text_all
        }

        # append each product data to the product data array
        product_data.append(data)

    # print the output data
    print(product_data)

The next step is writing your scrolling logic in an asynchronous scroller function. Start the page instance and set the product data array in this function. Execute the initial scrolling effect and get the initial page height:

scraper.py
async def scroller():
   
    # start a new Chrome instance
    driver = await uc.start()

    # visit the target website
    page = await driver.get("https://www.scrapingcourse.com/infinite-scrolling")

    # product array to collect data
    product_data = []

    # execute initial scrolling
    await page.scroll_down(1500)
    
    # get the value of the initial page height
    last_height = await page.evaluate("document.body.scrollHeight")

Extend the function with a while loop that scrolls continuously and waits for more pages to load before subsequent scrolls. Get the new page height and update the initial height with the value of the new one. Break the loop and execute the scraper function once the new and last heights are equal and there are no more pages to scroll:

scraper.py
    # ...

    while True:
        # scroll down the page
        await page.scroll_down(1500)

        # wait for content to load after scrolling
        await page.sleep(5)


        # get the value of the new page height
        new_height = await page.evaluate("document.body.scrollHeight")
        
        # if the height hasn't changed, it means we are at the bottom of the page
        if new_height == last_height:
           
            break

        # update the last height for the next iteration
        last_height = new_height

    # scrape the entire page after scrolling is complete
    await scraper(page, product_data)
   
    # close the page
    await page.close()

Finally, run the scroller function with asyncio:

scraper.py
# run the scroller function with asyncio
if __name__ == "__main__":
    asyncio.run(scroller())

Put the four snippets together. You'll get the following final code:

scraper.py
# import the required libraries
import nodriver as uc
import asyncio

async def scraper(page, product_data):
    # extract all the product containers
    products = await page.select_all(".product-item")

    # loop through each container to extract names and prices
    for product in products:
        product_name = await product.query_selector(".product-name")
        product_price = await product.query_selector(".product-price")

        # get all product texts into a dictionary
        data = {
            "Name": product_name.text_all,
            "Price": product_price.text_all
        }

        # append each product data to the product data array
        product_data.append(data)

    # print the output data
    print(product_data)

async def scroller():
   
    # start a new Chrome instance
    driver = await uc.start()

    # visit the target website
    page = await driver.get("https://www.scrapingcourse.com/infinite-scrolling")

    # product array to collect data
    product_data = []

    # execute initial scrolling
    await page.scroll_down(1500)


    # get the value of the initial page height
    last_height = await page.evaluate("document.body.scrollHeight")

    while True:
        # scroll down the page
        await page.scroll_down(1500)

        # wait for content to load after scrolling
        await page.sleep(5)
        
        # get the value of the new page height
        new_height = await page.evaluate("document.body.scrollHeight")


        # if the height hasn't changed, it means we are at the bottom of the page
        if new_height == last_height:
            break

        # update the last height for the next iteration
        last_height = new_height

    # scrape the entire page after scrolling is complete
    await scraper(page, product_data)
   
    # close the page
    await page.close()

# run the scraper function with asyncio
if __name__ == "__main__":
    asyncio.run(scroller())

The code above scrolls the entire page iteratively and obtains the names and prices of all its products:

Output
[
    {'Name': 'Chaz Kangeroo Hoodie', 'Price': '$52'},
    {'Name': 'Teton Pullover Hoodie', 'Price': '$70'},

    # ... other products omitted for brevity

    {'Name': 'Antonia Racer Tank', 'Price': '$34'},
    {'Name': 'Breathe-Easy Tank', 'Price': '$34'}
]

You've just scraped dynamically rendered data from a website with infinite scrolling using nodriver. Great job!

However, despite these useful capabilities, nodriver still comes with a few limitations that can hinder your web scraping efforts.

Nodriver's Limitations

Nodriver is a valuable tool for building web scrapers and bots. Still, its anti-bot bypass ability, proxy support, and dynamic content extraction strength tend to fall short.

First, nodriver doesn't have built-in support for proxy configuration. It may require a manual setup with the Chrome DevTool Protocol (CDP), which is challenging.

Another potential issue is that nodriver doesn't bypass all anti-bot systems. Although it gets through to nowsecure in some instances, it fails with a heavily protected website like G2.

We ran a 100-iteration benchmark on nodriver's ability to bypass G2's Cloudflare protection, and none of its requests went through.

The library can also be slow while scraping highly dynamic websites (like those using infinite scrolling), and its scarce community and unorganized documentation make it less beginner-friendly.

So, how can you overcome these limitations and make sure you can scrape efficiently and uninterrupted?

Avoid Getting Blocked While Scraping With Nodriver

Many websites implement advanced anti-bot mechanisms to detect any suspicious activity. Unfortunately, nodriver can't avoid them in most cases.

The best way to mitigate all its drawbacks is to use a web scraping API like ZenRows. It's an all-in-one web scraping solution that fixes your request headers, auto-rotates premium proxies, and bypasses CAPTCHAs and any other anti-bot measure at scale.

ZenRows also features JavaScript instructions, allowing it to act as a headless browser for scraping dynamic websites like those using infinite scrolling.

Let's see how ZenRows works by scraping a heavily protected website, the G2 Reviews page.

Sign up to open the ZenRows Request Builder. Paste the target URL in the link box, toggle the Boost mode to JS Rendering, and activate Premium Proxy. Choose Python as your preferred language and select the API connection mode. Copy and paste the generated code into your Python script.

Here's the generated code:

scraper.py
# pip install requests
import requests

# request parameters
params = {
    "url": "https://www.g2.com/products/asana/reviews",
    "apikey": "<YOUR_ZENROWS_API_KEY>",
    "js_render": "true",
    "premium_proxy": "true",
}

# send your request and get the response text to print the extracted HTML
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)

The code accesses the protected website and scrapes its full-page HTML:

Output
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
    <title>Asana Reviews, Pros + Cons, and Top Rated Features</title>
</head>
<body>
    <!-- other content omitted for brevity -->
</body>

Congratulations! You've just bypassed a Cloudflare-protected website using ZenRows.

Conclusion

In this article, you've seen how nodriver works and how to use it for content extraction. You've learned how to:

  • Scrape full-page HTML with nodriver.
  • Extract specific data from a single page.
  • Obtain content from multiple pages of a paginated website.
  • Export the extracted data to a CSV file.
  • Use nodriver to scrape dynamic content like infinite scrolling.

Despite all these capabilities, nodriver's shortcomings can easily result in the detection and blocking of your web scraper. To make sure it works smoothly, use ZenRows to bypass all anti-bot measures and scrape any website without getting blocked. Try ZenRows for free without a credit card, and get your API key with up to 1000 request credits!

Ready to get started?

Up to 1,000 URLs for free are waiting for you