Did you know that with the right tools and some guidance, you can access Walmart's product data efficiently without getting blocked? Over the years, web scraping sites like Walmart have proven invaluable, providing actionable business insights. Imagine being able to run competitive price analyses in real-time.
In this guide, we'll walk you through the step-by-step process of retrieving product information from Walmart. You'll also learn strategies to avoid getting blocked and how to store scraped data in a useful format for analysis.
- Step 1: Prerequisites.
- Step 2: Scrape Walmart product data.
- Step 3: Export scraped Walmart data to CSV.
- Step 4: Scraping multiple pages using Python.
- Easiest solution to scrape Walmart.
Step 1: Prerequisites
Before we dive into web scraping Walmart, ensure you meet the following prerequisites:
- Python.
- Requests.
- BeautifulSoup.
Follow the steps below to make sure everything's in place.
Run the following command in your terminal to verify your Python installation.
python --version
If Python runs on your machine, this command will return its version, as in the example below.
Python 3.13.0
Next, install Python requests and BeautifulSoup using pip.
pip3 install requests beautifulsoup4
That's it. You're all set up.
Navigate to a directory where you'd like to store your code, create a Python file (scraper.py
), open it using your preferred IDE, and get ready to write some code.
Step 2: Scrape Walmart Product Data
For a hands-on approach, we'll use the following Walmart product page as a target website.
We'll start by retrieving the full HTML of the page and then proceed to extract the following data:
- Product name.
- Price.
- Images.
- Description.
- Reviews.
Each step will break down how to identify and scrape each piece of data. By the end, you'll have a Walmart scraper capable of retrieving and storing complete data sets in structured formats.
So, without further ado, let's dive in.
Below is a basic scraper to fetch the full HTML of the target page.
import requests
# define target URL
url = "https://www.walmart.com/ip/Logitech-MX-Master-3S-Wireless-Performance-Mouse-Ergo-8K-DPI-Quiet-Clicks-USB-C-Black/731473988"
# make a GET request to the target URL
response = requests.get(url)
# retrieve HTML content
html = (response.text)
print(html)
This code outputs the HTML of the page, as seen below. We've truncated the result for brevity.
<html lang="en">
<head>
<!---- ---->
<title>
Logitech MX Master 3S, Ergonomic Wireless Mouse, 8K DPI, Silent Clicks, USB-C, Compatible with Computers, Black - Walmart.com
</title>
<!---- ---->
</html>
</head>
However, you'll most likely get a page asking you to prove you're human. Similar to the one below.
<html lang="en">
<head>
<title>Robot or human?</title>
<!--- ... --->
if (alt=='a') {
document.getElementById('message').innerHTML = '<p>Check the box to confirm that you're human. Thank You!</p>';
}
<!--- ... --->
</head>
<!--- ... --->
</html>
If you're getting this page, Walmart is blocking your request. This is a common web scraping challenge as websites like Walmart employ anti-bot solutions to mitigate bot traffic.
No need to worry, though. In a later section, we'll discuss a reliable, foolproof option for avoiding detection while web scraping Walmart. You can check it out now.
Find and Scrape the Product Name
With the HTML file ready, you can extract specific data points.
While the raw HTML appears as text, it follows a formal structure that can be parsed into the DOM (Document Object Model).
Let's start by parsing the HTML file using BeautifulSoup. Remember to import the BeautifulSoup library.
# import the required libraries
# ...
from bs4 import BeautifulSoup
# ...
# parse raw HTML file
soup = BeautifulSoup(html, "html.parser")
This converts the raw HTML file into a parse tree, allowing you to interact with elements in a parent-child structure.
Now, we can find and scrape the product name.
The product name is one of the most accessible data points on a product page. To retrieve this data from our target URL, follow the steps below.
Inspect the page to identify the selector or HTML elements containing the required information. You can do this by opening the target page in a browser, right-clicking on the product name, and selecting inspect.
You'll find that the product name is a <h1>
with a main-title
ID.
Using this information, select the element with the main-title
ID and extract its text content.
# ...
# select product name element and extract its text content
product_name = soup.find(id="main-title").text.strip()
print("Product Name": product_name)
BeautifulSoup provides different methods like find()
and find_all()
that allow you to select elements using IDs, class names, HTML tags, and any CSS selector.
In this case, we used the find()
method to select the <h1>
element using its ID.
Here's the result:
{
'Product Name': 'Logitech MX Master 3S, Wireless Performance Mouse, Ergo, 8K DPI, Quiet Clicks, USB-C, Black'
}
Locate and Get the Price
Extracting product price follows a similar process to the one applied for product name: locate the element containing the required information and extract its text content.
Using the same inspection techniques as in the previous section, you'll find that the price data is within a <span>
tag with an itemprop
attribute set to price
.
Select the identified <span>
tag using the find()
method and retrieve the product price.
# ...
# select price element and extract its text content
price = soup.find("span", {"itemprop": "price"}).text.strip()
print("Product Price: ", price)
This code prints the product price to your console, as shown below.
Product Price: $91.99
Locate and Scrape Product Images
The target product page displays images in a carousel format. Any image you click or hover over appears as the primary image. This means you can scrape all product images on the page by targeting the carousel container.
Here's how:
Using the previously mentioned technique, inspect the carousel container in your browser's DevTools to locate images and identify the right selectors.
These images are within a <div>
container with its data-testid
attribute set to vertical-carousel-container
, and each image block is a sub <div>
tag.
To retrieve these data, select the carousel container, find all <img>
tags within this container, and loop through to extract the src
attribute from each <img>
tag.
# ...
# initialize an empty list to store image data
image_data = []
# select the carousel container
carousel_container = soup.find("div", {"data-testid": "vertical-carousel-container"})
# find all the img tags within the carousel container
images = carousel_container.find_all("img")
# loop through each img tag and extract the src attribute
for image in images:
image_data.append(image.get("src"))
print(image_data)
The code returns a list of all the image URLs in the carousel container, as shown below. We've omitted some links for brevity.
[
'https://i5.walmartimages.com/seo/Logitech-MX-Master-3S-Wireless-Performance-Mouse-Ergo-8K-DPI-Quiet-Clicks-USB-C-Black_c1454fca-d817-4de1-ba66-042f1bd6fd36.d3df2148bddb2c6dd997c985a18bf2f5.jpeg?odnHeight=117&odnWidth=117&odnBg=FFFFFF'
# ... truncated for brevity ... #
]
Scrape Product Descriptions
As in previous cases, inspect the product description to identify its selectors.
You'll find that the product description is a collection of <li>
tags within a <span>
with a product-description-atf
ID.
Using this ID attribute, select the product description container, find all <li>
tags within the container, loop through, and extract their text content.
# ...
# initialize an empty list to store description data
description_data = []
# select the product description container
description_container = soup.find(id = "product-description-atf")
# find all the li tags within the container
description_lists = description_container.find_all("li")
# loop through each li tag and extract their text content
for list in description_lists:
description_data.append(list.text.strip())
print(description_data)
This code retrieves the product description, strips it of whitespaces, and outputs the following result.
[
'An iconic mouse remastered. Feel every moment of your workflow with even more precision, tactility, and performance, thanks to Quiet Clicks and an 8,000 DPI track-on-glass94 mm minimum glass thickness, sensor.'
# ... truncated for brevity ... #
]
Locate and Scrape Product Reviews
The product reviews on our target page are structured as individual block elements with headings, ratings, and a review body. Thus, to extract this information, you need to select all product review blocks, loop through, and retrieve each data point (heading, rating, and review body).
As before, inspect the product reviews to locate each block element and identify the right selectors.
The product reviews are in separate <div>
tags with multiple classes (flex flex-start flex-column pa0 mb4-l self-stretch-l
).
These classes aren't individually unique. So, we need to use them all when selecting the identified <div>
tags to avoid errors.
Walmart's selectors often change due to regular DOM structure updates. When following this tutorial, ensure you double-check and update them accordingly.
Using BeautifulSoup's find_all()
method, select all the block elements with the identified class names, loop through each block (div
), and extract the rating, heading, and review body.
#...
# initialize an empty list to store the reviews
reviews = []
# select all the divs with the identified class
review_blocks = soup.find_all("div", {"class": "flex flex-start flex-column pa0 mb4-l self-stretch-l"})
# loop through each div to extract the rating, heading, and body content
for block in review_blocks:
# extract review heading
heading_tag = block.find("h3")
review_heading = heading_tag.text if heading_tag is not None else "No Heading"
# extract rating
rating = block.find("span", {"class": "w_iUH7"}).text
# extract body content
body_content = block.find("span", {"class": "tl-m"}).text.strip()
# Store the extracted information in a dictionary
reviews.append ({
"heading": review_heading,
"rating": rating,
"body": body_content
})
# Print the extracted reviews
for review in reviews:
print(review)
You'll notice that not all reviews have headings. So, to avoid errors, we checked that the heading element exists before trying to access its text content.
This code stores the heading, rating, and review body in a list and logs the following output to your console.
{'heading': 'No Heading', 'rating': '5 out of 5 stars review', 'body': 'I appreciate the ability to ...'}
{'heading': 'Logitech MX Master 3S', 'rating': '5 out of 5 stars review', 'body': "The buttons ..."}
# ... truncated for brevity ... #
That's it. You've covered almost every aspect of web scraping Walmart. However, we can further fine-tune our Walmart scraper.
Let's see how.
Step 3: Export Scraped Walmart Data to CSV
When web-scraping e-commerce websites like Walmart, storing data in a structured and easy-to-use format is often essential for easy analysis.
Python's csv
module allows you to save scraped data in a CSV file. Here's how to modify your scraper to do so.
Let's start by organizing the scraped data. To do that, initialize an empty product_data
list and append all the extracted data to this list.
Remember to import the csv
module.
# import the required libraries
# ...
import csv
# ...
# initialize an empty list to store all product data
product_data = []
# ...
# append scraped data to product_data list
product_data.append ({
"Product Name": product_name,
"Price": price,
"Images": image_data,
"Product Description": description_data,
"Product Reviews": reviews
})
Next, open a CSV file in write mode, create a DictWriter
object, and define the field names.
# ...
# open a CSV file for writing
with open("outputs.csv", mode="w", newline="", encoding="utf-8") as file:
# create a CSV writer object using DictWriter
writer = csv.DictWriter(file, fieldnames=product_data[0].keys())
Since product_data
is a list of dictionaries, we used the keys of each dictionary to set the column headers.
Lastly, write the header row and data row.
# ...
# write the header row to the CSV file
writer.writeheader()
# write the data rows to the CSV file
for data in product_data:
writer.writerow(data)
That's it.
Now, put all the steps together to get the following complete code.
import requests
from bs4 import BeautifulSoup
import csv
# define target URL
url = "https://www.walmart.com/ip/Logitech-MX-Master-3S-Wireless-Performance-Mouse-Ergo-8K-DPI-Quiet-Clicks-USB-C-Black/731473988"
# make a GET request to the target URL
response = requests.get(url)
# retrieve HTML content
html = (response.text)
# parse raw HTML file
soup = BeautifulSoup(html, "html.parser")
# initialize an empty list to store all product data
product_data = []
# ... find and scrape product name ... #
# select product name element and extract its text content
product_name = soup.find(id="main-title").text.strip()
# ... locate and get the price ... #
# select price element and extract its text content
price = soup.find("span", {"itemprop": "price"}).text.strip()
# ... locate and scrape product images ... #
# initialize an empty list to store image data
image_data = []
# select the carousel container
carousel_container = soup.find("div", {"data-testid": "vertical-carousel-container"})
# find all the img tags within the carousel container
images = carousel_container.find_all("img")
# loop through each img tag and extract the src attribute
for image in images:
image_data.append(image.get("src"))
# ... scrape product description ... #
# initialize an empty list to store description data
description_data = []
# select the product description container
description_container = soup.find(id = "product-description-atf")
# find all the li tags within the container
description_lists = description_container.find_all("li")
# loop through each li tag and extract their text content
for list in description_lists:
description_data.append(list.text.strip())
# ... locate and scrape product reviews ... #
# initialize a list to store the reviews
reviews = []
# select all the divs with the identified class
review_blocks = soup.find_all("div", {"class": "flex flex-start flex-column pa0 mb4-l self-stretch-l"})
# loop through each div to extract the rating, heading, and body content
for block in review_blocks:
# extract review heading
heading_tag = block.find("h3")
review_heading = heading_tag.text if heading_tag is not None else "No Heading"
# extract rating
rating = block.find("span", {"class": "w_iUH7"}).text
# extract body content
body_content = block.find("span", {"class": "tl-m"}).text.strip()
# Store the extracted information in a dictionary
reviews.append ({
"heading": review_heading,
"rating": rating,
"body": body_content
})
# append scraped data to product_data list
product_data.append ({
"Product Name": product_name,
"Price": price,
"Images": image_data,
"Product Description": description_data,
"Product Reviews": reviews
})
# ... export scraped data to CSV ... #
# open a CSV file for writing
with open("outputs.csv", mode="w", newline="", encoding="utf-8") as file:
# create a CSV writer object using DictWriter
writer = csv.DictWriter(file, fieldnames=product_data[0].keys())
# write the header row to the CSV file
writer.writeheader()
# write the data rows to the CSV file
for data in product_data:
writer.writerow(data)
print("successfully exported to CSV")
This code creates a new outputs.csv
file and stores the scraped data in CSV format. You'll find this file in your root project directory with content similar to the image below.
Step 4: Scraping Multiple Pages Using Python
Let's scale up. You've seen how to scrape all the valuable information from a single product page. But what if you have reviews spanning multiple pages or are interested in various products on a search results page?
Scraping multiple pages is similar to the previous steps but with the addition of a process to handle pagination.
For this tutorial, we'll use a Walmart search results page as our target URL.
Walmart uses different page structures, which we can take advantage of to extract product data from multiple pages.
If you browse through the results, you'll notice that only the page number changes in the URL as you go from page to page. This means we can automate moving from one page to another in our code by incrementing the page number accordingly.
Follow the steps below to achieve this. We'll be scraping all the product URLs of the first 10 pages to keep things simple.
Start by identifying and defining the base URL with the page number parameter.
# import the required libraries
# ...
# define the base URL
base_url = "https://www.walmart.com/search?q=mouse&page={}"
Then, like previous examples, inspect any product on the page to identify the right selector.
You'll find that each product is within a <div>
tag with class ph0-xl
, and the product URL is an anchor tag. You'll also notice that the links in the href attribute are relative paths rather than complete URLs. Thus, you must concatenate them with Walmart's base URL (https://www.walmart.com
) to get a usable link.
Next, loop through the first 10 pages, increment the page number each time, and extract the product URL.
# ...
# initialize an empty list to store product URLs from all pages
product_urls = []
# loop through the first 10 pages
for page_num in range(1, 11):
# construct the URL for the current page
url = base_url.format(page_num)
# make a GET request to the target URL and parse the raw HTML file
# ...
# select all product listings
product_listings = soup.find_all("div", class_="ph0-x1")
# loop through each product container
for listing in product_listings:
# find anchor tag, and extract its href attribute
product_url = listing.find("a").get("href")
# concatenate Walmart's base url with the relative path
product_url = "https://walmart.com" + product_url
# append the product URL to the list
product_urls.append(product_url)
for url in product_urls:
print(url)
This code loops through the first 10 pages and returns every product URL on each page, as shown below. We've truncated the result for brevity.
[
'https://walmart.com/ip/Mini-Ultralight-Wired-Gaming-Mouse-4-Kinds-RGB-Backlit-4-Levels-Adjustable-Lightweight-Honeycomb-Shell-Mice-for-PC-Gamers-Xbox-PS4-Black/806352628?classType=REGULAR&from=/search',
# ... omitted for brevity ... #
]
Awesome! You now have a Walmart scraper capable of extracting data from multiple pages.
Easiest Solution to Scrape Walmart
If your attempt to retrieve a page's HTML is blocked, use the ZenRows Walmart Scraper to overcome this challenge. This solution offers everything you need to avoid detection while web scraping Walmart.
With features like premium proxies, JavaScript rendering, and advanced anti-bot bypass, ZenRows allows you to focus on extracting your desired data rather than the intricacies of circumventing anti-bot solutions.
Some additional benefits of using the ZenRows Walmart Scraper API include:
✅ Extract complete product information with a few lines of code.
✅ Download structured data in JSON and easily store it in a usable format, such as CSV.
✅ Tailored scraper for Walmart listings and deals.
✅ Quickly scrape multiple pages.
To use this tool, sign up to get your free API key.
You'll be redirected to the Request Builder page, where your ZenRows API key is at the top right.
Input your target URL and activate Premium Proxies and JS Rendering boost mode.
Then, select the Python language option and choose the API option. ZenRows works with any language and provides ready-to-use snippets for the most popular ones.
Remember to select your desired output format. The autoparse
option parses the HTML and returns a JSON result.
Lastly, copy the generated code on the right to your editor to test your code.
import requests
url = 'https://www.walmart.com/ip/Logitech-MX-Master-3S-Wireless-Performance-Mouse-Ergo-8K-DPI-Quiet-Clicks-USB-C-Black/731473988'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
'url': url,
'apikey': apikey,
'js_render': 'true',
'premium_proxy': 'true',
'autoparse': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)
This code avoids Walmart detection, retrieves the HTML, and automatically parses it to return the following JSON result with multiple fields.
{
"sku": "731473988",
"gtin": "097855174819",
"name": "Logitech MX Master 3S, Wireless Performance Mouse, Ergo, 8K DPI, Quiet Clicks, USB-C, Black",
"image": "https://i5.walmartimages.com/seo/Logitech-MX-Master-3S-Wireless-Performance-Mouse-Ergo-8K-DPI-Quiet-Clicks-USB-C-Black_c1454fca-d817-4de1-ba66-042f1bd6fd36.d3df2148bddb2c6dd997c985a18bf2f5.jpeg",
# ... omitted for brevity ... #
}
Congratulations! You're now well-equipped to scrape Walmart without getting blocked.
ConclusionÂ
Web scraping Walmart can provide actionable insights for businesses and individuals looking to run competitive market analysis. However, Walmart's anti-bot defenses can block your requests and deny you access.
For a hassle-free Walmart scraping, try ZenRows now.