Do you want to scrape product pages, listings, or any other information from Amazon? We've got you covered!
This tutorial will show you how to scrape Amazon with Python, from full-page HTML to specific product details and crawling multiple product listing pages.
Let's go!
Building an Amazon Product Scraper With Python
In this Amazon Python scraping tutorial, we'll scrape an Amazon product page using Python's Requests as the HTTP client and BeautifulSoup as the HTML parser. You'll start by extracting the full-page HTML and then proceed to scrape the following product details:
- Product name.
- Price.
- Rating count.
- Image.
- Product description.
See a sample demo of the target product page below:

Before you begin, let's see the prerequisites.
Step #1: Prerequisites
You'll need to run a few installations to follow this tutorial smoothly. Let's go through them quickly.
Python
This tutorial uses Python version 3.12.1. If you still need to, download and install the latest version from Python's official website.Â
Install Requests and BeautifulSoup:Â
You'll use Python's Requests library as the HTTP client. Then, you'll parse the returned HTML with BeautifulSoup. Install them using pip
:
pip3 install beautifulsoup4 requests
A suitable IDE
Although we'll use VS Code on a Windows OS, you can follow this tutorial with any IDE you choose.
Once everything is up and running, you're ready to scrape Amazon!
You can also use a headless browser like Selenium to scrape Amazon, so feel free to check our guide.
Step #2: Retrieve the Page HTML
Let's start with a basic query to extract the full HTML of the target product page using the Requests library. This step ensures that your HTTP client retrieves the website's content as expected.Â
Create a new scraper.py
file in your project root folder and insert the following code:
# import the requests library
import requests
# specify the target URL
target_url = ( "https://www.amazon.com/Portable-Mechanical-Keyboard-MageGee-Backlit/dp/B098LG3N6R/"
)
# send a get request to the target url
response = requests.get(target_url)
# check if the response status code is not 200 (ok)
if response.status_code != 200:
# print an error message with the status code
print(f"An error occurred with status {response.status_code}")
else:
# get the page html content
html_content = response.text
# print the html content
print(html_content)
The above code outputs the target web page HTML, as shown below. We've omitted some content for brevity:
<!doctype html>
<html lang="en-us" class="a-no-js" data-19ax5a9jf="dingo">
<head>
<!-- ... -->
<!-- DNS Prefetch to improve loading speed of images -->
<link rel="dns-prefetch" href="https://images-na.ssl-images-amazon.com">
<!-- ... -->
<title>Amazon.com: MageGee Portable 60% Mechanical Gaming Keyboard, MK-Box LED Backlit Compact 68 Keys Mini Wired Office Keyboard with Red Switch for Windows Laptop PC Mac - Black/Grey : Video Games</title>
<!-- ... -->
</head>
<body>
<!-- ... -->
</body>
</html>
However, due to potential IP bans and CAPTCHA protection, the Requests library may not work with complex websites like Amazon. If you got blocked, you can add a User Agent to Python's Requests library to mimic an actual browser and reduce the chances of getting blocked.
To add a custom User Agent, make the following changes to the previous script:
# ...
# specify your custom User Agent
custom_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
}
# specify the target URL
target_url = (
"https://www.amazon.com/Portable-Mechanical-Keyboard-MageGee-Backlit/dp/B098LG3N6R/"
)
# send a get request to the target url with a custom User Agent
response = requests.get(target_url, custom_headers)
While the above measure may secure your scraper against Amazon's CAPTCHA, it's still prone to IP bans, especially if sending multiple requests. One way to prevent potential IP restrictions is to add a proxy to your request to mimic a user from another location.
Below is an example of adding a free proxy to the previous scraper script. Keep in mind that these are free proxies from the Free Proxy List. They may not work at the time of reading due to their short lifespan and unreliability:
# ...
# set an https proxy for http and https connection types
proxies = {
"http": "http://47.90.205.231:33333",
"https": "http://47.90.205.231:33333",
}
# send a get request to the target url with a custom User Agent and a proxy
response = requests.get(target_url, custom_headers, proxies=proxies)
Read our article on setting up a proxy with Python's Requests library to learn more.
Your complete code should look like this after adding a proxy and a User Agent:
# import the requests library
import requests
# specify your custom User Agent
custom_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
}
# specify the target URL
target_url = (
"https://www.amazon.com/Portable-Mechanical-Keyboard-MageGee-Backlit/dp/B098LG3N6R/"
)
# set an https proxy for http and https connection types
proxies = {
"http": "http://47.90.205.231:33333",
"https": "http://47.90.205.231:33333",
}
# send a get request to the target url with a custom User Agent and a proxy
response = requests.get(target_url, headers=custom_headers, proxies=proxies)
# check if the response status code is not 200 (ok)
if response.status_code != 200:
# print an error message with the status code
print(f"An error occurred with status {response.status_code}")
else:
# get the page html content
html_content = response.text
# print the html content
print(html_content)
We won't be adding a proxy in this tutorial since we're only sending few requests, which is less likely to result in an IP ban. Feel free to add a proxy as you did above if you are sending multiple requests. However, it's best to use premium proxies because free proxies have a short lifespan and are unreliable.
The best approach is to use a web scraping API like ZenRows. ZenRows provides premium proxies with flexible geo-targeting features, including all the tools you need to bypass any anti-bot system at scale.
Ready? Now, let's scrape some real Amazon product data!
Step #3: Scrape Amazon Product Details
The next step is to parse the website's HTML using BeautifulSoup and extract specific product data using CSS selectors, which are more specific in locating elements using their IDs or class names.
Let's modify the previous code to parse the page's HTML. Add BeautifulSoup to your imports and use it to parse the HTML like so:
# import the required libraries
# ...
from bs4 import BeautifulSoup
if response.status_code != 200:
# ...
else:
# ...
# parse the HTML content using BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")
# ... scraping logic
Now, we'll go through the extraction of each target data step-by-step, starting with the product name.
Amazon's selectors often change due to regular DOM structure updates. Ensure you double-check them when following this tutorial.
Locate and Scrape Product Name
First, inspect the product name element. Open the product page on a browser like Chrome. Right-click on the product name and select Inspect to open the DevTools window.
You'll see a highlight on an <h1>
tag with a title
ID and a span
node. You can expand that span
child to view the title text:

Scrape that product title into a data dictionary inside the else
statement. Use BeautifulSoup's find
method since you expect only one matching element. The strip
method removes any space from the scraped data. Print the dictionary to view the extracted data:
# ...
else:
# ...
# create a dictionary to store scraped product data
data = {
"Name": soup.find(id="title").text.strip(),
}
# print the extracted data
print(data)
The above outputs the product page's title, as shown:
{
'Name': 'MageGee Portable 60% Mechanical Gaming Keyboard, MK-Box LED Backlit Compact 68 Keys Mini Wired Office Keyboard with Red Switch for Windows Laptop PC Mac - Black/Grey'
}
Locate and Scrape Product Price
Let's also inspect the price element to get an idea of its selector layout before scraping it. Since the bolder price is a prime deal, not the real one, you want to scrape the actual listing price instead.Â
Right-click on the listing price element and select Inspect. The listing price element is in a span
tag with the class name a-offscreen
:

Extract the listing price by pointing BeautifulSoup to its class. Modify the data dictionary, as shown below:
# ...
else:
# ...
# create a dictionary to store scraped product data
data = {
# ...,
"Price": soup.find(class_="a-offscreen").text.strip(),
}
The above code adds the listing price to the output as expected. You might wonder why the extracted value differs from the one on the website. That's because the scraped price data includes the percentage discount:
{
# ...,
'Price': '$24.94'
}
You've now scraped the product's listing price. The next on our list is the rating count.
Locate and Scrape the Rating Count
To extract the number of people who have rated the target product, let's first inspect the target element as we did earlier (right-click the rating count and select Inspect).
You'll see that it's in a span
tag with an ID of acrCustomerReviewText
:

That's easy to scrape since it uses an ID. Add the following line to your data dictionary to extract the rating count:
# ...
else:
# ...
# create a dictionary to store scraped product data
data = {
# ...,
"Rating count": soup.find(id="acrCustomerReviewText").text.strip(),
}
The above code updates the extracted data with the rating count:
{
# ...,
'Rating count': '7,380 ratings'
}
Now, you'll now extract the product's image URLs and description.
Scrape the Product Images
This task has two parts. First, you'll extract the featured image URL. Then, you'll scrape the supporting images in the vertical grid.
Let's start with the featured image. Again, inspect the image element (right-click the featured image and select Inspect). The image is inside a div
tag with an ID of imgTagWrapperId
:

Extend the data dictionary with a logic that extracts the src
attribute from the image inside that div
:
else:
# ...
# create a dictionary to store scraped product data
data = {
# ...,
"Featured image": soup.find(id="imgTagWrapperId").find("img").get("src"),
}
The code adds the featured image URL to the extracted data. See the result below:
{
# ...,
'Featured image': 'https://m.media-amazon.com/images/I/618zZ7u3sUL.__AC_SX300_SY300_QL70_ML2_.jpg'
}
The second task is to extract the URLs of the alternative images in the vertical grid.Â
Right-click the vertical image grid and select Inspect. So, the image grid is inside a div
tag with the class name altImages
. This element contains an unordered list of image elements:

To scrape the images, extract all the image tags (img
) from the parent element (div
) and loop through them to get their src
attributes into a separate list. Add this logic before the data dictionary:
# ...
else:
# ...
# extract the parent image element
images = soup.find(id="altImages").find_all("img")
# create an empty list to collect the smaller images
image_data = []
# loop through the image container to extract its image URLs
for image in images:
image_data.append(image.get("src"))
Add the list of the scraped image URLs to the data dictionary like so:
# create a dictionary to store scraped product data
data = {
# ...,
"Alternative images": image_data,
}
The above code updates the extracted data with a list of the alternative images:
{
# ...,
'Alternative images': [
'https://m.media-amazon.com/images/I/41S5pwGovuL._AC_US40_.jpg',
'https://m.media-amazon.com/images/I/41HOY7Hp4zL._AC_US40_.jpg',
# ... omitted for brevity,
'https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif'
]
}
You've just scraped product images from Amazon! Let's extract one last piece of information before writing the data to a CSV file.Â
Locate and Scrape Product Description
Place your cursor on the product description (the "About this item"Â section), then right-click and select Inspect.Â
Each description is a list (li
) under an unordered list tag (ul
). Expand the span
element tags inside one of the list tags, and you'll see the wrapped description text.

To extract all the description texts, get the unordered list element using its class names. In this case, we've used all the class names to avoid conflict with similar elements. Now, define an empty list to collect each text. Loop through the parent element's (ul
) to extract the target description texts into the empty list:
# ...
else:
# ...
# find the element containing product descriptions
descriptions = soup.find(class_="a-unordered-list a-vertical a-spacing-mini")
# create an empty list to collect the descriptions
description_data = []
# collect and store all product description texts
for description in descriptions.contents:
description_data.append(description.text.strip())
Add the extracted description text list to the data dictionary like so:
# ...
else:
# ...
# create a dictionary to store scraped product data
data = {
# ...,
"Description": description_data,
}
The code extracts the description texts as shown below:
{
# ...,
'Description': [
'Mini portable 60% compact layout: MK-Box is ...',
# ... omitted for brevity,
'Extensive compatibility: MageGee MK-Box mechanical...'
]
}
That's it! You've completed the initial tasks.Â
Let's combine all the snippets to see what the complete code looks like:
# import the required libraries
import requests
from bs4 import BeautifulSoup
# specify your custom User Agent
custom_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
}
# specify the target URL
target_url = (
"https://www.amazon.com/Portable-Mechanical-Keyboard-MageGee-Backlit/dp/B098LG3N6R/"
)
# send a get request to the target url with a custom User Agent
response = requests.get(target_url, headers=custom_headers)
# check if the response status code is not 200 (ok)
if response.status_code != 200:
# print an error message with the status code
print(f"An error occured with status {response.status_code}")
else:
# get the page html content
html_content = response.text
# parse the HTML content using BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")
# extract the parent image element
images = soup.find(id="altImages").find_all("img")
# create an empty list to collect the smaller images
image_data = []
# loop through the image container to extract its image URLs
for image in images:
image_data.append(image.get("src"))
# find the element containing product descriptions
descriptions = soup.find(class_="a-unordered-list a-vertical a-spacing-mini")
# create an empty list to collect the descriptions
description_data = []
# collect and store all product description texts
for description in descriptions.contents:
description_data.append(description.text.strip())
# create a dictionary to store scraped product data
data = {
"Name": soup.find(id="productTitle").text.strip(),
"Price": soup.find(class_="a-offscreen").text.strip(),
"Rating count": soup.find(id="acrCustomerReviewText").text.strip(),
"Featured image": soup.find(id="imgTagWrapperId").find("img").get("src"),
"Alternative images": image_data,
"Description": description_data,
}
# print the extracted data
print(data)
Run the above code, and you'll get the following combined output:
{
'Name': 'MageGee Portable 60% Mechanical Gaming ... for Windows Laptop PC Mac - Black/Grey',
'Price': '$24.94',
'Rating count': '7,380 ratings',
'Featured image': 'https://m.media-amazon.com/images/I/618zZ7u3sUL.__AC_SX300_SY300_QL70_ML2_.jpg',
'Alternative images': [
'https://m.media-amazon.com/images/I/41S5pwGovuL._AC_US40_.jpg',
'https://m.media-amazon.com/images/I/41HOY7Hp4zL._AC_US40_.jpg',
# ... omitted for brevity,
'https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/transparent-pixel._V192234675_.gif'
],
'Description': [
'Mini portable 60% compact layout: MK-Box is ...',
# ... omitted for brevity,
'Extensive compatibility: MageGee MK-Box mechanical...'
]
}
You've now scraped product information from the target Amazon page. Let's store the data in a CSV file.
Step 4: Export to CSV
In this step, you'll export the extracted data to a CSV file, ensuring you can access it later for further analysis.
Add Python's built-in csv
package to your imports. Then, update the previous code to write the data to a products.csv
file:
# ...
import csv
# ...
else:
# ...
# define the CSV file name for storing scraped data
csv_file = "products.csv"
# open the CSV file in write mode with proper encoding
with open(csv_file, mode="w", newline="", encoding="utf-8") as file:
# create a CSV writer object
writer = csv.writer(file)
# write the header row to the CSV file
writer.writerow(data.keys())
# write the data row to the CSV file
writer.writerow(data.values())
# print a confirmation message after successful data extraction and storage
print("Scraping completed and data written to CSV")
After updating the previous scraper code with the one above, you'll get the following final code:
# import the required libraries
import requests
from bs4 import BeautifulSoup
import csv
# specify your custom User Agent
custom_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
}
# specify the target URL
target_url = (
"https://www.amazon.com/Portable-Mechanical-Keyboard-MageGee-Backlit/dp/B098LG3N6R/"
)
# send a get request to the target url with a custom User Agent
response = requests.get(target_url, headers=custom_headers)
# check if the response status code is not 200 (ok)
if response.status_code != 200:
# print an error message with the status code
print(f"An error occured with status {response.status_code}")
else:
# get the page html content
html_content = response.text
# parse the HTML content using BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")
# extract the parent image element
images = soup.find(id="altImages").find_all("img")
# create an empty list to collect the smaller images
image_data = []
# loop through the image container to extract its image URLs
for image in images:
image_data.append(image.get("src"))
# find the element containing product descriptions
descriptions = soup.find(class_="a-unordered-list a-vertical a-spacing-mini")
# create an empty list to collect the descriptions
description_data = []
# collect and store all product description texts
for description in descriptions.contents:
description_data.append(description.text.strip())
# create a dictionary to store scraped product data
data = {
"Name": soup.find(id="productTitle").text.strip(),
"Price": soup.find(class_="a-offscreen").text.strip(),
"Rating count": soup.find(id="acrCustomerReviewText").text.strip(),
"Featured image": soup.find(id="imgTagWrapperId").find("img").get("src"),
"Alternative images": image_data,
"Description": description_data,
}
# define the CSV file name for storing scraped data
csv_file = "products.csv"
# open the CSV file in write mode with proper encoding
with open(csv_file, mode="w", newline="", encoding="utf-8") as file:
# create a CSV writer object
writer = csv.writer(file)
# write the header row to the CSV file
writer.writerow(data.keys())
# write the data row to the CSV file
writer.writerow(data.values())
# print a confirmation message after successful data extraction and storage
print("Scraping completed and data written to CSV")
The above code exports the extracted data to a products.csv
file. You'll find this file in your project root folder:

Look closely at the CSV file. You'll see that the "Alternative images" and "Description" fields still write data as lists. Feel free to insert each list item into a separate row programmatically.
Congrats! You've just built an Amazon scraper that exports data to a CSV.Â
However, you can scale this by scraping product listings from a search result and even crawling paginated pages. Let's do that in the next section.
Scrape Product Listings
So far, you've only scraped a single keyboard product page. But in most cases, you'll want to scrape many similar product types. You have to search for that product via Amazon's search bar to see all available listings for that specific product.
For instance, searching for "keyboards" returns many available keyboard brands, as shown below:

Look at your browser's link box; the "keyboards" query keyword is in the URL. Here's a formatted version of the URL:
https://www.amazon.com/s?k=keyboards
In this section, you'll scrape the above Amazon keyboard product listing (search result page). Since the products break into multiple pages, you'll also see how to handle pagination to crawl several pages.
Scrape Search Pages
Scraping Amazon's search result page is similar to how you scraped a product page. In this case, the only difference is that you have different products sharing identical selectors.
When you open the target listing page in your browser, you'll see that each product's title links to its product page. It means we can extract each product's URL from its title.Â
Let's inspect the first product to view each product's element structure. Right-click the first product's title and click Inspect. Each product's URL (a
tag) is a node of its title text (h2
tag).

You'll loop through the h2
tags to scrape the product links. Look at the link attached to the a
tag closely, and you'll see that it doesn't include Amazon's base URL and isn't a complete link. You should concatenate Amazon's base URL with the extracted links to get full URLs.Â
Let's see how to achieve that.
Amazon's listing pages tend to block requests that originate outside of a browser or without a trusted source. To increase the chances of success, include a Google referrer header to the previous request headers:
# ...
# specify your custom request headers
custom_headers = {
# ...,
"Referer": "https://www.google.com/",
}
Remember that you'll need to concatenate each extracted URL with the website's base URL. Specify Amazon's base URL followed by the listing page URL. Then, define an empty array to collect the scraped links:
# ...
# specify the base URL
base_url = "https://www.amazon.com"
# specify the target URL
target_url = "https://www.amazon.com/s?k=keyboards"
# define an empty list to collect extracted links
listing_data = []
All the other setups remain the same. However, you'll modify the else statement with a new scraping logic. Extract all the h2
tags:
# ...
else:
# ...
# extract all h2s with the product link
listings = soup.find_all(
"h2", class_="a-size-mini a-spacing-none a-color-base s-line-clamp-2"
)
Loop through the extracted listings and collect the attached product links. Use a condition to check if the extracted link contains a protocol (https
). Concatenate it with the base URL to form a complete product link. Finally, append the extracted data to the empty array and print it:
# ...
else:
# ...
# loop through the listings (h2s) to extract product URLs
for link in listings:
# find the href attribute in each h2
data = link.find("a", href=True).get("href")
# concatenate the extracted links with the base URL to form a complete URL
if not data.startswith("https"):
data = base_url + data
listing_data.append(data)
# output the links
print(listing_data)
Merge all the snippets, and you'll get the following final code:
# import the required libraries
import requests
from bs4 import BeautifulSoup
# specify your custom request headers
custom_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Referer": "https://www.google.com/",
}
# specify the base URL
base_url = "https://www.amazon.com"
# specify the target URL
target_url = "https://www.amazon.com/s?k=keyboards"
# define an empty list to collect extracted links
listing_data = []
# send a get request to the target url with a custom User Agent
response = requests.get(target_url, headers=custom_headers)
# check if the response status code is not 200 (ok)
if response.status_code != 200:
# print an error message with the status code
print(f"An error occured with status {response.status_code}")
else:
# get the page html content
html_content = response.text
# parse the HTML content using BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")
# extract all h2s with the product link
listings = soup.find_all(
"h2", class_="a-size-mini a-spacing-none a-color-base s-line-clamp-2"
)
# loop through the listings (h2s) to extract product URLs
for link in listings:
# find the href attribute in each h2
data = link.find("a").get("href")
# concatenate the extracted links with the base URL to form a complete URL
if not data.startswith("https"):
data = base_url + data
listing_data.append(data)
# output the links
print(listing_data)
The above code scrapes all the product links on the first Amazon search result page, as shown:
[
'https://www.amazon.com/XVX-Mechanical-Swappable-Pre-lubed-Stabilizer/dp/B0C9ZJHQHM/ref=sr_1_1?dib=eyJ2IjoiMSJ9.dHakaxTjnRzg31pEFYNyuCQOrOuf0dF5Z3wzHd8FIltiED2FP5xDcJ0rZ9-gGbs3tFwy-pOukbdWpOulziCJBihwSpbhcVckZsRYV4b_3-cF9EFNBLrt7oqUEUq7cbxGN_CsFUiKqkwZQ5gEutgWo29iKYdIU_oGVPVPGbVlrBvEbIOHNxx-hgrqwJHNp-ByLYeCdpX0hU3G9UQ9mymx68pJtpskSALL1ZGD5jzoAA8.f-tBF7mVmELtg06oXl52cfAHVRGwU47xV4xQv79hyn0&dib_tag=se&keywords=keyboards&qid=1721252373&sr=8-1',
# ... other links omitted for brevity,
'https://www.amazon.com/sspa/click?ie=UTF8&spc=MTo4MjAxNTA2OTAzMDgwMjU2OjE3MjEyNTIzNzM6c3BfYnRmOjMwMDEwMzcyNjI2NDQwMjo6MDo6&url=%2FASUS-II-Switch-Dampening-Hot-Swappable-PBT%2Fdp%2FB0C7KFZ5TL%2Fref%3Dsr_1_22_sspa%3Fdib%3DeyJ2IjoiMSJ9.dHakaxTjnRzg31pEFYNyuCQOrOuf0dF5Z3wzHd8FIltiED2FP5xDcJ0rZ9-gGbs3tFwy-pOukbdWpOulziCJBihwSpbhcVckZsRYV4b_3-cF9EFNBLrt7oqUEUq7cbxGN_CsFUiKqkwZQ5gEutgWo29iKYdIU_oGVPVPGbVlrBvEbIOHNxx-hgrqwJHNp-ByLYeCdpX0hU3G9UQ9mymx68pJtpskSALL1ZGD5jzoAA8.f-tBF7mVmELtg06oXl52cfAHVRGwU47xV4xQv79hyn0%26dib_tag%3Dse%26keywords%3Dkeyboards%26qid%3D1721252373%26sr%3D8-22-spons%26sp_csd%3Dd2lkZ2V0TmFtZT1zcF9idGY%26psc%3D1'
]
You've just extracted product links from the first Amazon search result page. Great job! Let's modify this scraper to follow more pages through pagination.
Handle Pagination
The previous scraper only extracts the product links from the first listing page. However, Amazon breaks the listings into several pages. In this part of the tutorial, you'll follow each page to scrape more product links.
As usual, let's inspect the next button element in the navigation bar. Scroll down the listing page, right-click the next button on the navigation bar, and then click Inspect. Although the next page link has many class names, we'll use s-pagination-next
since it's more descriptive:

Try to navigate to the last page (20). Observe the next page element in the inspection tab, and you'll see that it no longer shows any link. It means there are no more pages to crawl. You'll simulate this logic to terminate crawling once your scraper reaches the last page.
To extract the product links from all pages, implement logic to iteratively check for the presence of the next page link in the DOM and terminate crawling once it's gone.Â
First, insert all the previous logic into a while
loop. Then, instead of the previous else
statement, use a break
to stop the loop if the request fails. Here's the modification:
# ...
while True:
# ...
# check if the response status code is not 200 (ok)
if response.status_code != 200:
# ...
break
Add Python's time
module to your imports. Find the next page link, check if it exists, and extract its links. Like the previously extracted links, the next page link doesn't include the base URL. Concatenate it with Amazon's base URL. Then, implement a 3-second pause to reduce the request frequency and the chances of getting blocked:
# import the required libraries
# ...
import time
# ...
while True:
# ...
# find the next page link
next_page = soup.find("a", class_="s-pagination-next")
# check if next page exists and follow its URL if so
if next_page:
next_link = next_page.get("href")
# concatenate the next link with the base URL
if not next_link.startswith("https"):
target_url = base_url + next_link
# pause for 3 seconds before making the next request
time.sleep(3)
Break the while
loop once the next page link disappears from the DOM:
# ...
while True:
# ...
else:
print("No more next page")
# break the loop after following the pages
break
Finally, import Python's csv
library. Then, export the extracted links to a product-link.csv
file:
# import the required libraries
# ...
import csv
# ...
# define the CSV file name for storing scraped data
csv_file = "product_links.csv"
# write the collected links to a CSV file
with open(csv_file, "w", newline="") as csvfile:
csvwriter = csv.writer(csvfile)
# write the header
csvwriter.writerow(["Product URL"])
# write the data
for link in listing_data:
csvwriter.writerow([link])
# print a confirmation message after successful data extraction and storage
print("Data written to product_links.csv")
Now, combine the snippets. Here's your final code:
# import the required libraries
import requests
from bs4 import BeautifulSoup
import time
import csv
# specify your custom request headers
custom_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Referer": "https://www.google.com/",
}
# specify the base URL
base_url = "https://www.amazon.com"
# specify the target URL
target_url = "https://www.amazon.com/s?k=keyboards"
# define an empty list to collect extracted links
listing_data = []
while True:
# send a get request to the target url with a custom User Agent
response = requests.get(target_url, headers=custom_headers)
# check if the response status code is not 200 (ok)
if response.status_code != 200:
# print an error message with the status code
print(f"An error occurred with status {response.status_code}")
break
# get the page html content
html_content = response.text
# parse the HTML content using BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")
# extract all h2s with the product link
listings = soup.find_all(
"h2", class_="a-size-mini a-spacing-none a-color-base s-line-clamp-2"
)
# loop through the listings (h2s) to extract product URLs
for link in listings:
# find the href attribute in each h2
data = link.find("a").get("href")
# concatenate the extracted links with the base URL to form a complete URL
if not data.startswith("https"):
data = base_url + data
listing_data.append(data)
# find the next page link
next_page = soup.find("a", class_="s-pagination-next")
# check if next page exists and follow its URL if so
if next_page:
next_link = next_page.get("href")
# concatenate the next link with the base URL
if not next_link.startswith("https"):
target_url = base_url + next_link
# pause for 5 seconds before making the next request
time.sleep(3)
else:
print("No more next page")
# break the loop after following the pages
break
# define the CSV file name for storing scraped data
csv_file = "product_links.csv"
# write the collected links to a CSV file
with open(csv_file, "w", newline="") as csvfile:
csvwriter = csv.writer(csvfile)
# write the header
csvwriter.writerow(["Product URL"])
# write the data
for link in listing_data:
csvwriter.writerow([link])
# print a confirmation message after successful data extraction and storage
print("Data written to product_links.csv")
After crawling all pages, the code writes the extracted links to a CSV file. You'll find this file in your project root folder:

Great job! You've just crawled and extracted product link data from 19 Amazon listing pages using Python's Requests and BeautifulSoup.Â
Still, you must be aware of a few challenges while scraping Amazon. We'll discuss them in the next section.
Challenges and Solutions for Amazon Web Scraping
Extracting data from Amazon is not an easy task. Let's take a look at the challenges you're most likely to encounter.
Blocks and Bans
Amazon is well-protected, considering it's an e-commerce website with many people looking to get product data. Some of its anti-bot mechanisms include CAPTCHAs, invisible bot detection challenges, behavioral analysis, and more.Â
It can be difficult to bypass these security measures, especially if you're running multiple Amazon scraping instances. The most reliable approach to avoid getting blocked is to use web scraping tool like ZenRows.Â
ZenRows bypasses all anti-bot mechanisms under the hood and allows you to focus on your scraping logic. We'll explain more below.
Changes in Page Layout
Amazon frequently modifies its DOM structure, including CSS selectors. Such changes often break your previous parsers, causing your scraper to fail.Â
As a remedy, ensure you monitor the web page frequently for changes in DOM layout or HTML attributes and update your code regularly. To make your job easier, consider separating your CSS selectors from your scraping logic to make them easily editable.Â
You can also scrape Amazon with Scrapy, a powerful Python framework, so feel free to check our guide.
A Surefire Way to Scrape Amazon With Python
A web scraping API is the best solution for scraping any website without getting blocked. It's compatible with any programming language, and you can implement it within a few minutes. Another advantage of using a web scraping API is that it always works despite changes in anti-bot measures.
ZenRows is one of the most popular web scraping APIs that reliably bypass any blocks. It features an Amazon web scraper explicitly designed to extract the correct data from Amazon without hassle.
To try it out with the previous product page, sign up to load the ZenRows Request Builder. Paste the product URL in the link box and activate Premium Proxies and JS Rendering. Choose Python as your programming language and select the API connection mode. Then, copy and paste the generated code into your Python file:

The generated code should look like this:
# pip install requests
import requests
url = (
"https://www.amazon.com/Portable-Mechanical-Keyboard-MageGee-Backlit/dp/B098LG3N6R/"
)
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
"url": url,
"apikey": apikey,
"js_render": "true",
"premium_proxy": "true",
"autoparse": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params)
print(response.text)
The above code outputs the following JSON data:
{
"answers": "Search this page",
"availability": "In Stock",
"avg_rating": "4.4 out of 5 stars",
"category": "Video Games > PC > Accessories > Gaming Keyboards",
"discount": "-35%-19%",
"out_of_stock": false,
"price": "$24.94",
"title": "MageGee Portable 60% Mechanical Gaming Keyboard, MK-Box LED Backlit Compact 68 Keys Mini Wired Office Keyboard with Red Switch for Windows Laptop PC Mac - Black/Grey",
"features": [
{"Product Dimensions": "12.13 x 3.98 x 1.54 inches"},
{"Item Weight": "1.5 pounds"},
{"Manufacturer": "MageGee"},
# ... omitted for brevity,
],
"ratings": [
{"5 star": "67%"},
{"4 star": "17%"},
# ... omitted for brevity,
],
"images": ["https://m.media-amazon.com/images/I/618zZ7u3sUL._AC_SL1500_.jpg"],
}
Congratulations! You've just scraped data from Amazon with ease using ZenRows. Your scraper will now bypass potential and active anti-bot measures.
Conclusion
In this tutorial, you've seen how to scrape Amazon product pages and listings using Requests and BeautifulSoup in Python. Here's a summary of what you've learned:
- Get the full-page HTML of an Amazon product page.
- Scrape specific product details from an Amazon product page.
- Extract data from Amazon's search result pages.
- Crawl several Amazon listings by implementing pagination with the Requests library.
- Export the extracted data to a CSV file.
- The challenges of scraping data from Amazon.
With all said and done, web scraping Amazon can be challenging. We recommend using the ZenRows Amazon scraper, which gets you all the data you want without stress.
Try ZenRows for free now without a credit card!
Frequent Questions
How Does Amazon Detect Scraping?
Yes, Amazon can detect scraping activities by checking your IP address, browser parameters, User Agents, and referrer header, among other details. Once it flags you as a bot, the website will throw a CAPTCHA. If your scraper can't solve the CAPTCHA puzzle, Amazon may block your IP address.
Does Amazon Allow Web Scraping?
Definitely! But there's a caveat: Amazon uses rate-limiting and can block your IP address if you overburden the website. It also checks HTTP headers and blocks you if your activity seems suspicious.
If you try to crawl through multiple pages simultaneously without using rotating proxies, you can get blocked. Amazon's web pages also have different structures, and even different product pages have different HTML structures. Building a robust web crawling application can take a lot of work.
However, scraping product prices, reviews and listings is legal.