How to Scrape Amazon With Selenium: Step-by-Step Tutorial

Sergio Nonide
Sergio Nonide
September 6, 2024 · 7 min read

Do you want to extract product data from Amazon using your Selenium-based web scraper? We've got you covered!

In this tutorial, you'll learn step-by-step how to scrape Amazon with Selenium in Python, including best practices to avoid getting blocked.

Let's get right to it!

Build an Amazon Product Scraper With Selenium

In this Python Selenium scraping tutorial, we'll scrape the following Amazon product page.

Amazon Logitech Mouse Product Page
Click to open the image in full screen

You'll start with a basic scraper to access the page before scraping the following product information:

  • Product name.
  • Price.
  • Description.
  • Images.
  • Rating.

Let's start with the prerequisites.

Step #1: Prerequisites

This tutorial assumes you've installed Python on your machine. Otherwise, install the latest version from the Python download page. 

In this tutorial, we'll automate the Chrome browser with Selenium. So, in addition to Selenium, you'll need the WebDriverManager to manage the ChromeDriver installation automatically.

Open your command line to your project directory and install Selenium and the WebDriverManager using pip:

Terminal
pip3 install selenium webdriver-manager

You can follow this tutorial with any suitable IDE. We'll use VS Code for this tutorial.

Did you get everything ready? You're now prepared to scrape some data from Amazon!

Step #2: Access the Amazon Page

We'll first build a basic scraper to access the target product page. This step is essential to check if your Selenium setup works correctly.

Setting up a basic Selenium scraper is simple. Initiate an initial ChromeDriver installation using the ChromeDriverManager and add it to the Selenium WebDriver Services. This installation only occurs the first time you run the code and doesn't apply to subsequent executions:

Example
# import the required libraries
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# install ChromeDriver and set up the driver instance
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

Open the target website and close the driver instance:

Example
# ...

# specify the target URL
target_url = (
    "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
)

# visit the target URL
driver.get(target_url)

# quit the driver instance
driver.quit()

Here's a combination of both snippets:

Example
# import the required libraries
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# install ChromeDriver and set up the driver instance
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

# specify the target URL
target_url = (
    "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
)

# visit the target URL
driver.get(target_url)

# quit the driver instance
driver.quit()

The above code will open the target web page in a browser interface (non-headless mode), showing that your Selenium setup works. 

However, opening the interface increases memory overhead and isn't recommended for real-life scraping. For this tutorial, we'll run Selenium in headless mode.

To change the above to headless mode, introduce the ChromeOptions and add the headless option. Then, include that option as an argument in the driver instance:

Example
# ...

# set up Chrome options
options = webdriver.ChromeOptions()

# run Chrome in headless mode
options.add_argument("--headless=new")

# install ChromeDriver and set up the driver instance
driver = webdriver.Chrome(
    options=options, service=Service(ChromeDriverManager().install())
)

Modify the previous scraper with these changes, and here's your new basic Selenium scraper:

Example
# import the required libraries
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# set up Chrome options
options = webdriver.ChromeOptions()

# run Chrome in headless mode
options.add_argument("--headless=new")

# install ChromeDriver and set up the driver instance
driver = webdriver.Chrome(
    options=options, service=Service(ChromeDriverManager().install())
)

# specify the target URL
target_url = (
    "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
)

# visit the target URL
driver.get(target_url)

# quit the driver instance
driver.quit()

The above scraper now runs the Chrome browser without a user interface. Let's build on it to extract specific content.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Step #3: Scrape Amazon Product Details

To scrape specific product details, you'll need to select HTML elements from the target web page. Selenium has a built-in HTML parser that allows you to parse elements using CSS selectors based on the By class.

Before going ahead, add the By class to your imports:

Example
# import the required libraries


# ...

from selenium.webdriver.common.by import By

Locate and Scrape Product Name

Before scraping the product name, inspect its web element to reveal its CSS selectors. Open the target web page, right-click the product's name, and select Inspect.

The product name is a span tag inside an h2 element with an ID of title:

Amazon Product Page Name Inspection
Click to open the image in full screen

Now, create a dictionary to collect the extracted data. Extract the product's name into this dictionary using the By.ID and find_element methods:

Example
# ...

# extract the product name
product_name = driver.find_element(By.ID, "title").text

# create a dictionary to store scraped product data
data = {
    "Name": product_name,
}

# print the extracted data
print(data)

The code outputs the product's name as shown:

Output
{
    'Name': 'Logitech G502 HERO High Performance Wired Gaming Mouse, HERO 25K Sensor, 25,600 DPI, RGB, Adjustable Weights, 11 Programmable Buttons, On-Board Memory, PC / Mac'
}

You've just scraped your first Amazon product information! Let's move to the product's price.

Locate and Scrape Product Price

Similarly, let's inspect the price element to view its CSS selector. 

Since the target is the actual listing price, not the discounted one, right-click on the product listing price and select Inspect to open its element in the browser console.

The price element is inside a span tag with the class name a-offscreen as shown:

Amazon Product Page Price Element
Click to open the image in full screen

Using the find_element method to search the listing price element (by the class name a-offscreen) returns an empty string. That's because this element is buried inside multiple nodes, and searching the element inspection console returns several similar elements. 

We'll use JavaScript's querySelector via Selenium's execute_script method to select the price element more precisely. First, query the element using its immediate parent node. Then extract the listing price text from the query:

Example
# ...

# find the price element with JavaScript's querySelector
price_element = driver.execute_script(
    'return document.querySelector(".a-price.a-text-price span.a-offscreen")'
)

# get the text of the listing price
price = driver.execute_script("return arguments[0].textContent", price_element)

Insert the extracted price text into the data dictionary:

Example
# ...

# create a dictionary to store scraped product data
data = {
    # ...,
    "Price": price,
}

The code updates the result with the extracted price:

Output
{
    # ...
    'Price': '$79.99'
}

You now know how to use Selenium's execute_script method to interact with HTML directly. Keep going!

Locate and Scrape Product Description

Right-click the product's description (the "About this item" section) and click Inspect. You'll see that each description is a list (li) inside an unordered list (ul):

Amazon Product Page Description Element
Click to open the image in full screen

Extract the unordered list using its CSS selector and collect all its nodes (list tags):

Example
# ...

# extract the description list
description_list = driver.find_element(
    By.CSS_SELECTOR, "ul.a-unordered-list.a-vertical.a-spacing-mini"
)

# find all list items within the description list
description_items = description_list.find_elements(By.TAG_NAME, "li")

Create an empty description_data list to collect each description as a separate item. Loop through each list tag to extract its text content and append it to the empty list:

Example
# ...

# create an empty list to collect the descriptions
description_data = []

# collect and store all product description texts
for item in description_items:
    # get the text content of the span within the li
    description_text = item.find_element(By.TAG_NAME, "span").text.strip()
    description_data.append(description_text)

Add the description_data to the data dictionary:

Example
# ...

# create a dictionary to store scraped product data
data = {
    # ...,
    "Description": description_data,
}

The code adds the description data to the output as shown:

Output
{
    # ...,

    'Description': [
        'Hero 25K sensor through a software update from G HUB, this...,'

        # ... omitted for brevity,
       
        'Microprocessor: 32-bit ARM. Use Logitech G HUB to save your...'
       
    ],
}

Locate and Scrape Product Rating

Let's inspect the product rating to expose its elements and CSS selectors. Right-click the rating score below the product name and select Inspect.

The rating score is a span under a parent node with the ID acrPopover:

Amazon Product Page Rating Element
Click to open the image in full screen

You can easily extract the review score from this ID:

Example
# ...

# extract the rating score
rating = driver.find_element(By.ID, "acrPopover").text

Update the data dictionary with this extracted rating score:

Example
# ...

# create a dictionary to store scraped product data
data = {
    # ...,
    "Rating": ratings,
}

The code now adds the rating score to the extracted product data:

Example
{
    # ...,

    'Rating': '4.7',
}

There's one more piece of information left to scrape. Keep going!

Locate and Scrape Product Image

Let's collect the product's featured image. Right-click the product's main image and select Inspect. 

The featured image tag (img) is inside a div with an ID imgTagWrapperId:

Amazon Product Page Image Element
Click to open the image in full screen

Select the parent element containing the featured image, scrape the image from it, and extract the image src attribute to get its URL:

Example
# ...

# select the div element containing the featured image
image_element = driver.find_element(By.ID, "imgTagWrapperId")

# scrape the image tag from its parent div
product_image = image_element.find_element(By.TAG_NAME, "img")

# get the image src attribute
product_image_url = product_image.get_attribute("src")

Finally, insert the extracted image URL into the data dictionary:

Example
# ...

# create a dictionary to store scraped product data
data = {
    # ...,
    "Featured Image": product_image_url,
}

The code outputs the featured image URL, as shown:

Output
{
    # ...,

    'Featured Image': 'https://m.media-amazon.com/images/I/61mpMH5TzkL._AC_SY355_.jpg',
}

You've now extracted the target product details. Let's combine all the code snippets to get the following complete code:

Example
# import the required libraries
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

# set up Chrome options
options = webdriver.ChromeOptions()

# run Chrome in headless mode
options.add_argument("--headless=new")

# install ChromeDriver and set up the driver instance
driver = webdriver.Chrome(
    options=options, service=Service(ChromeDriverManager().install())
)

# specify the target URL
target_url = (
    "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
)

# visit the target URL
driver.get(target_url)

# extract the product name
product_name = driver.find_element(By.ID, "title").text

# find the price element with JavaScript's querySelector
price_element = driver.execute_script(
    'return document.querySelector(".a-price.a-text-price span.a-offscreen")'
)

# get the text of the listing price
price = driver.execute_script("return arguments[0].textContent", price_element)

# extract the description list
description_list = driver.find_element(
    By.CSS_SELECTOR, "ul.a-unordered-list.a-vertical.a-spacing-mini"
)

# find all list items within the description list
description_items = description_list.find_elements(By.TAG_NAME, "li")

# create an empty list to collect the descriptions
description_data = []

# collect and store all product description texts
for item in description_items:
    # get the text content of the span within the li
    description_text = item.find_element(By.TAG_NAME, "span").text.strip()
    description_data.append(description_text)

# extract the rating score
ratings = driver.find_element(By.ID, "acrPopover").text

# select the div element containing the featured image
image_element = driver.find_element(By.ID, "imgTagWrapperId")

# scrape the image tag from its parent div
product_image = image_element.find_element(By.TAG_NAME, "img")

# get the image src attribute
product_image_url = product_image.get_attribute("src")

# create a dictionary to store scraped product data
data = {
    "Name": product_name,
    "Price": price,
    "Description": description_data,
    "Rating": ratings,
    "Featured Image": product_image_url,
}

See the complete output below:

Output
{
    'Name': 'Logitech G502 HERO High Performance Wired Gaming Mouse, HERO 25K Sensor, 25,600 DPI, RGB, Adjustable Weights, 11 Programmable Buttons, On-Board Memory, PC / Mac',
    'Price': '$79.99'
    'Description': [
        'Hero 25K sensor through a software update from G HUB, this...,

        #... omitted for brevity

        'Microprocessor: 32-bit ARM. Use Logitech G HUB to save your...'

    ],
    'Rating': '4.7',
    'Featured Image': 'https://m.media-amazon.com/images/I/61mpMH5TzkL._AC_SY355_.jpg',
}

Great job! We'll collect this product data into a CSV file in the next section.

Step #4: Export Data to CSV

The last step is to write the extracted data into a CSV file, allowing you to store the product information for further analysis.

Let's update the previous code to reflect these changes.

First, import Python's CSV package. Specify a CSV file name, open a new CSV file in write mode, and insert the extracted data:

Example
# import the required libraries

# ...

import csv

# ...

# define the CSV file name for storing scraped data
csv_file = "product.csv"
# ...

# open the CSV file in write mode with proper encoding
with open(csv_file, mode="w", newline="", encoding="utf-8") as file:
    # create a CSV writer object
    writer = csv.writer(file)

    # write the header row to the CSV file
    writer.writerow(data.keys())

    # write the data row to the CSV file
    writer.writerow(data.values())

# print a confirmation message after successful data extraction and storage
print("Scraping completed and data written to CSV")

Merge the above snippet with the previous scraper. Here's the final code:

Example
# import the required libraries
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
import csv

# set up Chrome options
options = webdriver.ChromeOptions()

# run Chrome in headless mode
options.add_argument("--headless=new")

# install ChromeDriver and set up the driver instance
driver = webdriver.Chrome(
    options=options, service=Service(ChromeDriverManager().install())
)

# specify the target URL
target_url = (
    "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"
)

# visit the target URL
driver.get(target_url)

# extract the product name
product_name = driver.find_element(By.ID, "title").text

# find the price element with JavaScript's querySelector
price_element = driver.execute_script(
    'return document.querySelector(".a-price.a-text-price span.a-offscreen")'
)

# get the text of the listing price
price = driver.execute_script("return arguments[0].textContent", price_element)

# extract the description list
description_list = driver.find_element(
    By.CSS_SELECTOR, "ul.a-unordered-list.a-vertical.a-spacing-mini"
)

# find all list items within the description list
description_items = description_list.find_elements(By.TAG_NAME, "li")

# create an empty list to collect the descriptions
description_data = []

# collect and store all product description texts
for item in description_items:
    # get the text content of the span within the li
    description_text = item.find_element(By.TAG_NAME, "span").text.strip()
    description_data.append(description_text)

# extract the rating score
ratings = driver.find_element(By.ID, "acrPopover").text

# select the div element containing the featured image
image_element = driver.find_element(By.ID, "imgTagWrapperId")

# scrape the image tag from its parent div
product_image = image_element.find_element(By.TAG_NAME, "img")

# get the image src attribute
product_image_url = product_image.get_attribute("src")

# create a dictionary to store scraped product data
data = {
    "Name": product_name,
    "Price": price,
    "Description": description_data,
    "Rating": ratings,
    "Featured Image": product_image_url,
}

# define the CSV file name for storing scraped data
csv_file = "product.csv"

# open the CSV file in write mode with proper encoding
with open(csv_file, mode="w", newline="", encoding="utf-8") as file:
    # create a CSV writer object
    writer = csv.writer(file)

    # write the header row to the CSV file
    writer.writerow(data.keys())

    # write the data row to the CSV file
    writer.writerow(data.values())

# print a confirmation message after successful data extraction and storage
print("Scraping completed and data written to CSV")

# quit the driver instance
driver.quit()

The final code creates a product.csv file with the extracted product data in your project root directory. See the CSV file below:

Amazon Product Data CSV
Click to open the image in full screen

Awesome! You now know how to extract data from Amazon using Selenium with Python. However, scraping Amazon with Selenium comes with a few challenges you should know.

Challenges of Web Scraping Amazon With Selenium

Although Selenium is a great scraping tool, it may be insufficient to scrape Amazon, especially if you're extracting data from the e-commerce store at scale. Let's look at a few weaknesses of the Selenium-based scraper, along with the solutions for them.

Blocks and Bans

Amazon is well protected and often challenging to scrape at scale. It employs security measures, such as invisible JavaScript challenges, rate-limited IP bans, and CAPTCHAs, to restrict automated programs from accessing its pages. 

Unfortunately, bypassing these restrictions with base headless browsers like Selenium can be difficult. That's because headless browsers show bot-like properties like the WebDriver and lack the required evasion strategies to avoid advanced anti-bot detection.

Even if you follow best practices, such as changing the User Agent and fixing proxies, Amazon's defense mechanism is powerful enough to detect and block your request. The best way to scrape Amazon without getting blocked is using a web scraping API like ZenRows. We'll explain more later.  

Inefficient Performance

Headless browsers like Selenium take up the system's memory, resulting in poor performance. Increasing wait time, managing browser instances, handling JavaScript, and loading resources reduce Selenium's performance.

Here are a few ways to speed up Selenium:

  • Run the browser in headless mode.
  • Block extra resources, such as images and CSS. 
  • Use optimized selectors such as CSS selectors.
  • Run multiple Selenium instances in parallel using cloud grids.

There are more ways to optimize Selenium's speed. Read our detailed guide on speeding up Selenium to learn more.

Changes in Page Layout

Big websites like Amazon often change the DOM layout, including element structure and attributes, causing your previous selectors to fail. 

One way to mitigate this challenge is to monitor the site's HTML layout and update your selectors regularly to reflect layout changes. Another good practice is to isolate element selectors from your scraping logic using the page object model (POM). This approach allows you to locate and fix outdated selectors quickly.

Selenium Alternative: Scrape Amazon With a Web Scraping API

A web scraping API is the best solution to scrape any website at scale without getting blocked. One of its advantages is that it's compatible with any programming language and easy to implement. Additionally, a web scraping API works every time despite the frequent security updates of anti-bot systems. 

ZenRows is the most popular web scraping API. It features a dedicated Amazon scraper, a ready-made solution for extracting the correct data from Amazon at scale without stress.

ZenRows's dedicated Amazon scraper helps you to:

  • Optimize your requests and auto-bypasses CAPTCHAs and other anti-bot mechanisms.
  • Automatically extract accurate data in JSON format.
  • Auto-parse data from various Amazon pages, including products, listings, search results, best sellers, questions and answers, and more.
  • Auto-rotate premium proxies to avoid Amazon's rate-limited IP bans.
  • Grants you access to localized products in 185+ countries around the world.

You only need a single API call, and ZenRows handles the scraping task under the hood. Let's try ZenRows with the previous product page to see how it works.

Sign up to load the Request Builder.

Paste the product URL in the link box and activate Premium Proxies and JS Rendering.

Select Python as your programming language and choose the API connection mode. Copy and paste the generated code into your Python file:

building a scraper with zenrows
Click to open the image in full screen

The generated code should look like this:

Example
# pip install requests
import requests

url = "https://www.amazon.com/Logitech-G502-Performance-Gaming-Mouse/dp/B07GBZ4Q68/"

apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
    "url": url,
    "apikey": apikey,
    "js_render": "true",
    "premium_proxy": "true",
    "autoparse": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)

The above code parses the product page and returns a JSON format of the extracted product details:

Output
{
    "answers": "Search this page",
    "availability": "In Stock",
    "avg_rating": "4.7 out of 5 stars",
    "category": "Video Games > PC > Accessories > Gaming Mice",
    "description": "Logitech updated its iconic G502 gaming mouse...",
    "out_of_stock": false,
    "price": "$44.90",
    "price_without_discount": "$79.99",
    "title": "Logitech G502 HERO High Performance Wired Gaming Mouse, HERO 25K Sensor, 25,600 DPI, RGB, Adjustable Weights, 11 Programmable Buttons, On-Board Memory, PC / Mac",
    "features": [
        {"Brand": "Logitech G"},
        {"Series": "Logitech G502 HERO High Performance Gaming Mouse"},
        {"Item model number": "910-005469"},
        # ... omitted for brevity,
        {"Manufacturer": "Logitech"},
        {"ASIN": "B07GBZ4Q68"},
        {"Is Discontinued By Manufacturer": "No"},
        {"Date First Available": "August 24, 2018"},
        {
            "Best Sellers Rank": "#15 in Video Games (See Top 100 in Video Games)   #1 in PC Gaming Mice"
        },
    ],
    "images": ["https://m.media-amazon.com/images/I/61mpMH5TzkL._AC_SL1500_.jpg"],
}

Congratulations! You just parsed an Amazon product page automatically with ZenRows.

Conclusion

You've seen how to scrape Amazon with Selenium in Python, including tips for optimizing your scraper. Here's a recap of what you've learned:

  • Build a basic Amazon scraper to access the product page.
  • Extract specific product data from an Amazon product page.
  • Write the scraped data to a CSV file.
  • Beat the challenges of scraping data from Amazon.

As mentioned, scraping Amazon at scale is challenging, as your scraper faces potential IP bans and anti-bot measures. The easiest way to get your desired Amazon product data is to use ZenRows' Amazon scraper.

Try ZenRows for free now without a credit card!

Ready to get started?

Up to 1,000 URLs for free are waiting for you