eBay is one of the largest e-commerce websites in today's market. Therefore, accessing and analyzing its data can offer numerous benefits, including a competitive edge in a continuously evolving industry.
In this guide, we'll walk you through the step-by-step process of retrieving publicly available product information from eBay. You'll also learn strategies to avoid getting blocked, scrape multiple pages, and store scraped data in a structured format for faster analysis.
Let's roll.
- Step 1: Prerequisites.
- Step 2: Scrape eBay product data.
- Step 3: Export scraped eBay data to CSV.
- Scraping multiple eBay listings.
- Easiest Solution to scrape eBay.
Step 1: Prerequisites
To follow along, ensure you meet these requirements:
- Python.
- Requests.
- BeautifulSoup.
Here are steps you can take to get ready for this tutorial.
Enter the following command in your terminal to verify your Python installation.
python --version
If Python is installed on your machine, you'll get the corresponding version when you run the command above.
For example:
Python 3.13.0
Next, install the required libraries: Python Requests and BeautifulSoup.
pip3 install requests beautifulsoup4
That's it. You're all set up.
If you haven't already, navigate to the directory where you'd like to store your code and create a Python file (scraper.py
).
Open this file using your preferred IDE, and prepare to write some code.
Step 2: Scrape eBay Product Data
We'll scrape real-world examples to make this tutorial practical, starting with the following eBay product page.
After retrieving the full HTML of the target page, we'll extract the following specific data points:
- Product name.
- Price.
- Images.
- Description.
- Reviews.
Each step we cover in this guide will break down how to locate and scrape HTML elements containing the desired data points. By the end of this tutorial, you'll have an eBay scraper capable of retrieving and storing valuable information in a structured format.
To begin, here's a basic scraper to retrieve the page's full HTML.
import requests
# define target URL
url = "https://www.ebay.com/itm/125575167955?_skw=mouse+wireless"
# make a GET request to the target URL
response = requests.get(url)
# retrieve HTML content
html = (response.text)
print(html)
Here's the output. We've truncated the result below to keep things simple.
<html lang="en">
<head>
<!---- ---->
<title>
Portable Wireless Mouse, 2.4GHz Silent with USB Receiver, Optical USB Mouse | eBay
</title>
<!---- ---->
</html>
</head>
However, eBay uses advanced rate-limiting technologies that can block your request once you exceed its threshold. If you're facing this challenge, often recommended techniques like specifying Python Requests proxies and custom user agents can get you over the hump.
If all fails, hang on for a more reliable and easy solution in a later section, or you can check it now.
Find and Extract the Product Name
Now that we have the raw HTML file, we can extract specific data points like product names. But to be able to do that, you must first parse the HTML file.
Thus, import the BeautifulSoup library and create a soup
object.
# import the required libraries
# ...
from bs4 import BeautifulSoup
# ...
# parse the raw HTML file
soup = BeautifulSoup(html, "html.parser")
This converts the raw HTML file into a parse tree, which you can navigate and interact with to extract data points.
Now you can find and extract the product name.
For that, inspect the target page in a browser to identify the HTML element containing the product name: Right-click on the product name and select Inspect. This will open the Developer tools, where you can analyze the raw HTML.
You'll find that the product name is the only <h1>
tag on the page. Therefore, using BeautifulSoup's find()
method, locate the <h1>
tag and extract its text content.
# ...
# ... find and extract the product name ... #
# select h1 tag and extract its text content
product_name = soup.find("h1").text.strip()
print(f"Product Name: {product_name}")
This will output the following result.
Product Name: Portable Wireless Mouse, 2.4GHz Silent with USB Receiver, Optical USB Mouse
Locate and Get the Price
Using the same inspection techniques as in the previous steps, locate the HTML tag containing the product price.
You'll find that the price is within a <div>
block with class x-price-primary
. Select the <div>
block and extract its text content by applying this information.
# ...
# ... locate and get the price ... #
# select price div and extract its text content
price = soup.find("div", {"class": "x-price-primary"}).text.strip()
print(f"Price: {price}")
This code outputs the product's price, as seen below.
Price: US $7.95/ea
Extract Product Images
The product images are structured in a carousel container. Any image you click on or hover on appears as the primary image. This means you can extract all the product images on the page by scraping the carousel container.
To do that, inspect the page to locate the HTML element containing the images and identify the right selectors.
You'll find that the carousel container is a <div>
element with class ux-image-grid
. Each image is an <img>
tag within button
elements.
Select the carousel container using this information, then find all <img>
tags within the container and extract their src
attributes.
# ...
# initialize an empty list to store image data
image_data = []
# select the carousel container
carousel_container = soup.find("div", {"class": "ux-image-grid"})
# find all the img tags within the carousel container
images = carousel_container.find_all("img")
if images:
# loop through each img tag and extract the src attribute
for image in images:
image_data.append(image.get("src"))
print(image_data)
This code stores the image URLs in a list and logs the list in your console, as shown below.
[
'https://i.ebayimg.com/images/g/jpoAAOSwrf9jVE4k/s-l140.jpg', 'https://i.ebayimg.com/images/g/1i8AAOSwSyhjVE5R/s-l140.jpg', 'https://i.ebayimg.com/images/g/SqUAAOSwD45jVE5S/s-l140.jpg',
# ... omitted for brevity ... #
]
Locate and Get Product Descriptions
The product descriptions are divided into two sections: a quick overview, which is in a table structure, and a more detailed main description, which is displayed as a list of items.
As before, inspect the raw HTML to identify the right selectors.
You'll notice that the product descriptions are embedded in an <iframe>
with ID, desc_ifr
. This <iframe>
has a src
attribute pointing to the URL of the HTML containing the product description structure.
To extract this data, you need to make a GET
request to the URL in the <iframe>
, retrieve the HTML, and parse it to get the product descriptions.
Let's start by locating the <iframe>
and retrieving the HTML of the product description URL using the information above.
# ...
# ... locate and get product descriptions ... #
# initialize an empty list to store description data
description_data = []
# locate the iframe
iframe = soup.find('iframe', {'id': 'desc_ifr'})
if iframe:
# get the src URL of the iframe
iframe_url = iframe.get('src')
# retrieve the HTML of the iframe content (description)
iframe_content = (requests.get(iframe_url)).text
# ...
else:
print("iframe not found")
After that, parse the HTML to extract the product description. To be able to do that, you need to also inspect this HTML.
The overview section of the description is a table, and the main descriptions are list items within a <div>
block with ID, featurebullets_feature_div
.
Select the table, loop through each cell, and extract its data using these details.
# ...
# parse the iframe content
iframe_soup = BeautifulSoup(iframe_content, 'html.parser')
# select the description overview table
overview_table = iframe_soup.find("table")
# find all table cells, loop through each, and extract their data
for row in overview_table.find_all('tr'):
cells = [td.text.strip() for td in row.find_all('td')]
description_data.append({"overview": cells})
Next, select the main description <div>
container, find all list items within the container, loop through, and extract their text content.
# ...
# select main description container
main_description = iframe_soup.find(id = "feature-bullets")
# find all list items with main description container
description_list = main_description.find_all("li")
# loop each list and extract its text content
for list in description_list:
features = list.text.strip()
description_data.append({"main_description": list.text.strip()})
Now, put all the steps together to get the following complete code.
# ...
# ... locate and get product descriptions ... #
# initialize an empty list to store description data
description_data = []
# locate the iframe
iframe = soup.find('iframe', {'id': 'desc_ifr'})
if iframe:
# get the src URL of the iframe
iframe_url = iframe.get('src')
# retrieve the HTML of the iframe content (description)
iframe_content = (requests.get(iframe_url)).text
# parse the iframe content
iframe_soup = BeautifulSoup(iframe_content, 'html.parser')
# select the description overview table
overview_table = iframe_soup.find("table")
# find all table cells, loop through each, and extract their data
for row in overview_table.find_all('tr'):
cells = [td.text.strip() for td in row.find_all('td')]
description_data.append(cells)
# select main description container
main_description = iframe_soup.find(id = "feature-bullets")
# find all list items with main description container
description_list = main_description.find_all("li")
# loop each list and extract its text content
for list in description_list:
features = list.text.strip()
description_data.append(list.text.strip())
print(description_data)
else:
print("iframe not found")
This code stores the product_description in a list and outputs the following result:
[
['Connectivity Technology', 'USB'], ['Special Feature', 'wireless'], ['Number of Buttons', '4'], ['Hand Orientation', 'Ambidextrous'],
'â–¶[HIGH DURABILITY & STABLE CONNECTION]: computer mouse has 5,000,000 clicks ...',
# ... truncated for brevity ... #
]
Locate and Extract Product Reviews
The product reviews are structured as bullet points at the bottom of the page. As in previous steps, inspect the raw HTML to identify the HTML elements containing the product reviews.
You'll find that each review is a list item with class fdbk-container
. Thus, select all <li>
items, loop through each and extract their text content.
# ...
# ... locate and extract product reviews ... #
# create an empty list to store review data
review_data = []
# find all reviews
review_lists = soup.find_all("li", {"class": "fdbk-container"})
if review_lists:
# loop through each list and extract its text content
for list in review_lists:
review = list.find("div", {"class": "fdbk-container__details__comment"}).text.strip()
review_data.append(review)
print(review_data)
else:
print("reviews not found")
This code will output the product reviews as shown below.
[
'Awesome seller! Accidentally ordered this twice and was quickly refunded. ...',
'This wireless mouse is by far the most ergonomic and quiet clicking mouse I have ...',
'Silent as advertised, ... Great quality.',
# .. truncated for brevity ... #
]
Step 3: Export Scraped eBay Data to CSV
When eBay web scraping, storing data in a structured format is often essential for easy analysis. In this section, you'll learn how to export scraped data to CSV using Python's csv
module.
Let's start by organizing your scraped data. To do that, create an empty list to store product data and append all data points to the list.
# import the required libraries
# ...
import csv
# ...
# initialize an empty list to store all product data
product_data = []
# ...
# append scraped data to product_data list
product_data.append ({
"Product Name": product_name,
"Price": price,
"Images": image_data,
"Product Description": description_data,
"Product Review": review_data,
})
Next, open a CSV file in write mode, create a DictWriter
object, and define the field names. Since product_data
is a list of dictionaries, you can use the keys of each dictionary to set the column headers.
# ...
# ... export scraped data to CSV ... #
# open a CSV file for writing
with open("outputs.csv", mode="w", newline="", encoding="utf-8") as file:
# create a CSV writer object using DictWriter
writer = csv.DictWriter(file, fieldnames=product_data[0].keys())
Lastly, write the header row and data row.
# ...
# write the header row to the CSV file
writer.writeheader()
# write the data rows to the CSV file
for data in product_data:
writer.writerow(data)
That's it.
Now, put all the steps you've learned so far together. You'll get the following complete code.
# import the required libraries
import requests
from bs4 import BeautifulSoup
import csv
# define target URL
url = "https://www.ebay.com/itm/125575167955?_skw=mouse+wireless"
# make a GET request to the target URL
response = requests.get(url)
# retrieve HTML content
html = (response.text)
# parse the raw HTML file
soup = BeautifulSoup(html, "html.parser")
# initialize an empty list to store all product data
product_data = []
# ... find and extract the product name ... #
# select h1 tag and extract its text content
product_name = soup.find("h1").text.strip()
# ... locate and get the price ... #
# select price div and extract its text content
price = soup.find("div", {"class": "x-price-primary"}).text.strip()
# ... extract product images ... #
# initialize an empty list to store image data
image_data = []
# select the carousel container
carousel_container = soup.find("div", {"class": "ux-image-grid"})
# find all the img tags within the carousel container
images = carousel_container.find_all("img")
if images:
# loop through each img tag and extract the src attribute
for image in images:
image_data.append(image.get("src"))
# ... locate and get product descriptions ... #
# initialize an empty list to store description data
description_data = []
# locate the iframe
iframe = soup.find('iframe', {'id': 'desc_ifr'})
if iframe:
# get the src URL of the iframe
iframe_url = iframe.get('src')
# retrieve the HTML of the iframe content (description)
iframe_content = (requests.get(iframe_url)).text
# parse the iframe content
iframe_soup = BeautifulSoup(iframe_content, 'html.parser')
# select the description overview table
overview_table = iframe_soup.find("table")
# find all table cells, loop through each, and extract their data
for row in overview_table.find_all('tr'):
cells = [td.text.strip() for td in row.find_all('td')]
description_data.append(cells)
# select main description container
main_description = iframe_soup.find(id = "feature-bullets")
# find all list items with main description container
description_list = main_description.find_all("li")
# loop each list and extract its text content
for list in description_list:
features = list.text.strip()
description_data.append(list.text.strip())
else:
print("iframe not found")
# ... locate and extract product reviews ... #
# create an empty list to store review data
review_data = []
# find all reviews
review_lists = soup.find_all("li", {"class": "fdbk-container"})
if review_lists:
# loop through each list and extract its text content
for list in review_lists:
review = list.find("div", {"class": "fdbk-container__details__comment"}).text.strip()
review_data.append(review)
else:
print("reviews not found")
# append scraped data to product_data list
product_data.append ({
"Product Name": product_name,
"Price": price,
"Images": image_data,
"Product Description": description_data,
"Product Review": review_data,
})
# ... export scraped data to CSV ... #
# open a CSV file for writing
with open("outputs.csv", mode="w", newline="", encoding="utf-8") as file:
# create a CSV writer object using DictWriter
writer = csv.DictWriter(file, fieldnames=product_data[0].keys())
# write the header row to the CSV file
writer.writeheader()
# write the data rows to the CSV file
for data in product_data:
writer.writerow(data)
This code creates a new outputs.csv
file and exports the scraped data to CSV. You'll find this file in your project's root directory.
Here's a truncated screenshot sample for reference:
Awesome! You've created your first eBay web scraper.
Scraping Multiple eBay Listings
If you want to extract product information at scale, scraping multiple pages is essential. This process follows the same approach as those applied in previous steps but with an additional process of handling pagination.
You need to navigate each page and extract all the necessary data. This includes understanding the website's pagination structure and using this information to navigate the page.
Let's break it down into steps.
For this tutorial, we'll scrape an eBay search result page. Also, we'll be scraping all the product URLs of the first 10 pages.
eBay uses different page structures, which we can take advantage of to extract product data from multiple pages.
If you browse through the page, You'll find the Next button at the bottom of the page to navigate the search result pages.
Starting from the initial search result, you can continuously click the Next button to access the following pages. In other words, the new page URL is the href
attribute of the Next button.
Using this information, you can navigate through the search result pages and extract all product links.
Here's how:
Start by defining the base URL. This is the initial URL of the search result page.
# import the required libraries
# ...
# define the base URL
base_url = "https://www.ebay.com/sch/i.html?_from=R40&_nkw=mouse&_sacat=0"
After that, create a function to extract all product links from the current page.
To do that, inspect the page to identify the right selectors for each product's URL.
You'll find that each product is a list item with class s-item s-item__pl-on-bottom
, and the product URL is an anchor tag.
Using this information, create the function to extract product links. This function will select all product listings, loop through each, and extract the href
attribute of each listing.
# ...
# function to scrape addresses from a single page
def extract_product_links(soup):
# initialize an empty list to store links
product_links = []
# select all listings on current page
product_listings = soup.find_all("li", {"class": "s-item s-item__pl-on-bottom"})
# loop through listing and retrieve links
for listing in product_listings:
# find anchor tag, and extract its href attribute
product_link = listing.find("a").get("href")
# append the product URL to the list
product_links.append(product_link)
return product_links
Next, create a function to handle pagination.
Remember that the next page URL is the href
attribute of the Next button. Thus, inspect the page's pagination to identify the right selector.
You'll find that the Next button is an anchor tag with its type attribute set to next
.
Now, select the next button and return its href
attribute.
Since we're only interested in the first 10 pages, add logic to stop scraping after 10 pages. To do this, include the current page number as an argument in your function and return None
if it's greater than 10.
#...
# function to handle pagination
def get_next_page_url(soup, current_page_number):
# check if current page number is greater than 10
if current_page_number >= 10:
return None
# identify the next button and retrieve its href attribute
next_button = soup.find("a", {"type": "next"})
# check if the next page exists else, return none
if next_button and "href" in next_button.attrs:
return next_button["href"]
return None
Next, create a final function to extract all links from each search result page.
In this function, call extract_product_links()
to scrape the current page and use get_next_page_url()
to navigate through the search result page.
Also, add a small time delay between scraping pages to avoid triggering anti-bot restrictions.
Remember to initialize the current page number and increment it when calling the get_next_page_url()
function.
# import the required libraries
#...
import time
# ...
# function to extract addresses from each search result page
def scrape_ebay_pages(base_url):
current_url = base_url
all_product_links = []
# initialize current page number
current_page_number = 1
# make a GET request to current page
while current_url:
print(f"Scraping: {current_url}")
response = requests.get(current_url)
# parse the raw HTML of current page
soup = BeautifulSoup(response.text, "html.parser")
# call the extract_product_links function to scrape the current page
product_links = extract_product_links(soup)
# add links to the all_links list after scraping current page
all_product_links.extend(product_links)
print(f"Found {len(product_links)} product links on this page.")
# get the URL for the next page and stop scraping after 10 pages
next_page = get_next_page_url(soup, current_page_number)
if next_page:
current_url = next_page
# increment current page number
current_page_number +=1
else:
print("First 10 pages scraped.")
break
# add time delay after scraping each page.
time.sleep(2)
return all_product_links
That's it.
To start the scraping process, call the scrape_ebay_pages()
function.
# ...
# start the scraping process
links = scrape_ebay_pages(base_url)
Now, put all the steps together and handle your input to get the following complete code.
# import the required libraries
import requests
from bs4 import BeautifulSoup
import time
import csv
# define the base URL
base_url = "https://www.ebay.com/sch/i.html?_from=R40&_nkw=mouse&_sacat=0"
# function to scrape addresses from a single page
def extract_product_links(soup):
# initialize an empty list to store links
product_links = []
# select all li stings on current page
product_listings = soup.find_all("li", {"class": "s-item s-item__pl-on-bottom"})
# loop through listing and retrieve links
for listing in product_listings:
# find anchor tag, and extract its href attribute
product_link = listing.find("a").get("href")
# append the product URL to the list
product_links.append(product_link)
return product_links
# function to handle pagination
def get_next_page_url(soup, current_page_number):
# check if current page number is greater than 10
if current_page_number >= 10:
return None
# identify the next button and retrieve its href attribute
next_button = soup.find("a", {"type": "next"})
# check if the next page exists else, return none
if next_button and "href" in next_button.attrs:
return next_button["href"]
return None
# function to extract addresses from each search result page
def scrape_ebay_pages(base_url):
current_url = base_url
all_product_links = []
# initialize current page number
current_page_number = 1
# make a GET request to current page
while current_url:
print(f"Scraping: {current_url}")
response = requests.get(current_url)
# parse the raw HTML of current page
soup = BeautifulSoup(response.text, "html.parser")
# call the extract_product_links function to scrape current page
product_links = extract_product_links(soup)
# add links to the all_links list after scraping current page
all_product_links.extend(product_links)
print(f"Found {len(product_links)} product links on this page.")
# get the URL for the next page and stop scraping after 10 pages
next_page = get_next_page_url(soup, current_page_number)
if next_page:
current_url = next_page
# increment current page number
current_page_number +=1
else:
print("First 10 pages scraped.")
break
# add time delay after scraping each page.
time.sleep(2)
return all_product_links
# start the scraping process
product_links = scrape_ebay_pages(base_url)
# export to CSV
with open("product_links.csv", mode="w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(["product_link"])
for product_link in product_links:
writer.writerow([product_link])
This code navigates through the search result pages, extracts all the product links on each page, and exports them to CSV.
You'll find a new product_links.csv
file in your project's root directory. Here's a sample screenshot for reference.
Congratulations! You now have an eBay web scraper capable of scraping multiple pages.Â
Easiest Solution to Scrape eBay
If eBay blocks your attempt to retrieve its page's HTML, use the ZenRows eBay Scraper API to overcome this challenge. This solution offers everything you need to avoid detection while scraping eBay.
With features like premium proxies, JavaScript rendering, fingerprint spoofing, advanced anti-bot bypass, and more, ZenRows allows you to focus on extracting your desired data rather than the intricacies of circumventing anti-bot solutions.
Some additional benefits of using the ZenRows eBay Scraper API include:
✅ Extract complete product information with a few lines of code.
✅ Download structured data in JSON and easily store it in a usable format, such as CSV.
✅ Get tailored scraper for specific data sets, including images, price, and seller.
✅ Quickly scrape up to 1,000 pages at zero costs as part of ZenRow's free trial.
To use this tool, sign up to get your free API key.
You'll be redirected to the Request Builder page, where your ZenRows API key is at the top right.
Input your target URL and activate the Premium Proxies and JS Rendering mode.
Then, select the Python language option and choose the API option. ZenRows works with any language and provides ready-to-use snippets for the most popular ones.
Remember to select your desired output format. The autoparse
option parses the HTML and returns a JSON result.
Lastly, copy the generated code on the right to your editor to test your code.
import requests
url = 'https://www.ebay.com/itm/125575167955?_skw=mouse+wireless'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
'url': url,
'apikey': apikey,
'js_render': 'true',
'premium_proxy': 'true',
'autoparse': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)
This code retrieves the HTML and automatically parses it to return the following JSON result with multiple fields.
{
"name": "Portable Wireless Mouse, 2.4GHz Silent with USB Receiver, Optical USB Mouse",
"url": "https://www.ebay.com/itm/125575167955",
# ... truncated for brevity ... #
}
Congratulations! You're now well-equipped to scrape eBay without getting blocked.
ConclusionÂ
eBay web scraping can provide actionable insights for businesses and individuals looking to run competitive market analysis. In this guide, you've learned how to extract eBay product information and store this data in a structured format.
However, eBay's rate-limiting and anti-bot technologies can block your requests and deny you access. But with the ZenRows web scraping API, you can seamlessly navigate this challenge and scrape at scale without any limitations. Try ZenRows now!