Zillow is a treasure trove of information for real estate enthusiasts. However, to fully take advantage of its database, you'll need to automate extracting and analyzing valuable data. Web scraping Zillow can help you discover actionable insights, facilitating informed decision-making in your research and investments.
In this guide, you'll learn how to scrape Zillow without getting blocked, as well as tips for extracting detailed data and handling pagination. Let's begin!
- Step 1: Prerequisites.
- Step 2: Scrape Zillow property data.
- Step 3: Export scraped Zillow property data to CSV.
- Step 4: Scraping multiple Zillow listings.
- Easiest solution to scrape Zillow.
Step 1: Prerequisites
To follow along in this tutorial, ensure you meet these prerequisites.
- Python.
- Requests.
- BeautifulSoup.
Follow the steps below to make sure everything's in place.
Run the following command in your terminal to verify your Python installation.
python --version
If Python runs on your machine, this command will return its version, as in the example below.
Python 3.13.0
Next, create a Python virtual environment and install the Python requests and BeautifulSoup libraries using pip.
pip3 install requests beautifulsoup4
That's it. You're all set up.
Now, navigate to a directory where you'd like to store your code, create a Python file (scraper.py
), open it using your preferred IDE, and get ready to write some code.
Step 2: Scrape Zillow Property Data
The best way to learn web scraping on Zillow is by working with a real-world example. For this tutorial, we'll scrape the following Zillow property page.
After retrieving the full HTML of the page, we'll extract the following data points, one after the other.
- Address.
- Price.
- Area (sq. ft.).
- Room details.
- Image.
Each step will break down how to identify target elements and retrieve each piece of data. By the end of this tutorial, you'll have a working Zillow scraper capable of retrieving property data and storing it in a usable format.
Without further ado, let's roll.
Here's a basic Python scraper to fetch the full HTML of the target page, from which we will extract the details above.
import requests
# define target URL
url = "https://www.zillow.com/homedetails/722-N-Trumbull-Ave-Chicago-IL-60624/3810331_zpid/"
# make a GET request to the target URL
response = requests.get(url)
# retrieve HTML content
html = (response.text)
print(html)
Ideally, this code should output the HTML of the page. However, you're more likely to receive the following response:
<!DOCTYPE html>
<html lang="en">
<head>
<!-- -->
<title>Access to this page has been denied</title>
</head>
</html>
This is because Zillow uses anti-bot solutions that are blocking your requests. Some recommended best practices to overcome this challenge are using proxies in Python Requests and specifying a custom user agent.
If all fails, we've preserved, in a later section, a more reliable and foolproof solution that'll guarantee web scraping Zillow without getting blocked.
In the meantime, here's our raw HTML. We've truncated the result for brevity.
<html lang="en">
<head>
<!-- -->
<title>
722 N Trumbull Ave, Chicago, IL 60624 | MLS #12208809 | Zillow
</title>
</head>
</html>
Find and Extract the Address
Now that you've got the HTML file, you can extract specific data points. However, to be able to locate and manipulate target elements within the HTML structure, you must parse the raw HTML.
Thus, create a BeautifulSoup object to parse the HTML file. Remember to import the BeautifulSoup library if you haven't already done so.
# import the required libraries
# ...
from bs4 import BeautifulSoup
# ...
# parse the raw HTML file
soup = BeautifulSoup(html, "html.parser")
This converts the downloaded HTML into a structured format you can work with programmatically.
Only now can you find and extract the address.
The address is one of the first data points on a Zillow property page. To retrieve this information, follow the steps below:
Inspect the target web page to identify the elements or HTML tag housing the address. You can do this using the DevTools: Navigate to the target page in a browser, right-click the address, and select Inspect.
You'll find that the address is the first and only <h1>
on the page.
This is a straightforward use case, as you can quickly employ BeautifulSoup's find()
method to locate the target element without specifying other attributes.
Next, select the target element and extract its text content.
# ...
# select h1 and extract its text content
address = soup.find("h1").text.strip()
print(f"Address: {address}")
This code logs the property's address to your console, as shown below:
Address: 722 N Trumbull Ave, Chicago, IL 60624
Locate and Get the Price
Let's move on to the next key data point: the property's price. In this step, we'll identify the target element and extract its text content just as we did with the address.
Here's how:
Locate the price within the raw HTML using the same inspection technique as discussed in the previous section.
You'll find that the price data is within a <span>
tag with class price-text
.
Applying this information, locate the identified <span>
tag using BeautifulSoup's find()
method. Then, retrieve its text content.
# ...
# select span with class price-text and extract its text content
price = soup.find("span", {"class":"price-text"}).text.strip()
print(f"Price: {price}")
This will output the price as shown below:
Price: $399,000
Locate and Get the Area (sq.ft.)
The property area is located just below the address on the target page. Like the previous steps, examine the website's HTML to identify the element or tag containing the area.
The area is within a <span>
element with the class name; Text-c11n-8-100-1__sc-aiai24-0 hdp__sc-6k0go5-3
, nested in the <div>
container.
These CSS selectors are dynamic and often change with frequent DOM updates. Thus, when following this tutorial, ensure you double-check and update them accordingly.
However, all the <span>
elements in the said <div>
container share the same class names. Therefore, unlike previous steps, using the find()
method will not work, as this method only returns the first occurrence of the specified criteria.
So, what do we do?
You can filter down to the target data using its position in the hierarchy when you have multiple elements with the same CSS selectors.
To do that, select all the <span>
elements using BeautifulSoup's find_all()
method, locate the element at the desired position, and extract its text content.
The target element is the third <span>
in the hierarchy, and here's the code to locate and get its text content (area (sqft)).
# ...
# select all identified span elements
spans = soup.find_all("span", {"class":"Text-c11n-8-100-1__sc-aiai24-0 hdp__sc-6k0go5-3 jbRdkh llcOCk"})
# access the span at the desired position
position = 2
if len(spans) > position:
area = spans[position].text.strip()
print(f"Area: {area}")
else:
print("Area not found")
This will log the property's area to your console, as shown below.
Area: 3,720 sqft
Locate and Extract Room Details
The room details are located in the facts and features section. This property has five bedrooms, three bathrooms, a dining room, a kitchen, a family room, a laundry, and a living room.
Each room is structured as an individual block element with headings and details. This means you can retrieve each room's data by selecting all room blocks, looping through each, and extracting their text content (heading and features).
Below is a step-by-step guide.
As before, inspect the page to identify the right selectors for each block element.
You'll find that all the room details (blocks) are within a <div>
tag with the class names, styles__StyledCategories-fshdp-8-100-2__sc-1mj0p8k-0 bNUjPE
, and each block is a separate <div>
.
Using this information, select the <div>
container, find all the room details within the container, and loop through each, extracting their text content (heading and features).
# create an empty list to store room data
room_data = []
# select the container div
container = soup.find("div", {"class": "styles__StyledCategories-fshdp-8-100-2__sc-1mj0p8k-0 bNUjPE"})
# within the container, find all div elements
room_details = container.find_all("div")
# loop through each div and extract the heading and room features
for room_detail in room_details:
# extract the heading
heading = room_detail.find('h6').text.strip()
# extract the room features
details = [item.text.strip() for item in room_detail.find_all('li')]
room_data.append({
"heading": heading,
"details": details
})
print(room_data)
This code stores the room details in a list and prints the list as shown below.
[
{'heading': 'Bedrooms & bathrooms', 'details': ['Bedrooms: 5', 'Bathrooms: 3', 'Full bathrooms: 3']}, {'heading': 'Primary bedroom', 'details': ['Features: Flooring (Hardwood), Window ..]}
# ... truncated for brevity ... #
]
Extract Property Images
The target page contains numerous images you can access by clicking its See all media
button. For pages requiring JavaScript interactions, you may want to use Python's headless browser options. However, we'll focus on extracting only the featured images, which you can find in a carousel slide at the top of the page.
As in previous steps, inspect the page to identify the right selectors.
You'll find that the hero images are <img>
tags nested in picture elements within multiple <span>
tags. These spans all have their data-testid
attribute set to hero-carousel-picture
.
Therefore, to extract these image data, select all identified <span>
tags, loop through each, and extract each image's src
attribute.
#...
# create an empty list to store image data
image_data = []
# select all identified <span> tags
spans = soup.find_all("span", {"data-testid": "hero-carousel-picture"})
# loop through each span and extract each image's src attributes.
for span in spans:
image_url = span.find("img").get("src")
# append image URL to image_data list
image_data.append(image_url)
print(image_data)
This will output a list containing each property image's URL, as shown below.
[
'https://photos.zillowstatic.com/fp/6f79fe95902da3b29c689c9d03481b50-sc_1152_768.jpg', 'https://photos.zillowstatic.com/fp/0bde4d044f2d1b0b31a5142ed4262671-sc_1152_768.jpg',
# ... omitted for brevity ... #
]
Step 3: Export Scraped Zillow Property Data to CSV
Storing data in structured formats makes analyzing and gaining valuable insights easier. You can export Zillow data to CSV using Python's csv
module.
Here's how to modify your code accordingly.
Let's start by organizing our scraped data. To do that, create an empty zillow_property_data
list and append all the extracted data to this list. Remember to import Python's csv
module.
# import the required libraries
# ...
import csv
# ...
# initialize an empty list to store all scraped data
zillow_property_data = []
# ...
# append scraped data to zillow_property_data list
zillow_property_data.append ({
"Address": address,
"Price": price,
"Area (sqft)": area,
"Room details": room_data,
"Images": image_data
})
Next, open a CSV file in write mode, create a DictWriter
object, and define the field names.
# ...
# open a CSV file for writing
with open("outputs.csv", mode="w", newline="", encoding="utf-8") as file:
# create a CSV writer object using DictWriter
writer = csv.DictWriter(file, fieldnames=zillow_property_data[0].keys())
Since zillow_property_data
is a list of dictionaries, use the keys of each dictionary to set the column headers.
Lastly, write the header row and data row.
# ...
# write the header row to the CSV file
writer.writeheader()
# write the data rows to the CSV file
for data in zillow_property_data:
writer.writerow(data)
That's it.
Now, modify your code using the above steps to get the following complete code.
# import the required libraries
import requests
from bs4 import BeautifulSoup
import csv
# define target URL
url = "https://www.zillow.com/homedetails/722-N-Trumbull-Ave-Chicago-IL-60624/3810331_zpid/"
# make a GET request to the target URL
response = requests.get(url)
# retrieve HTML content
html = (response.text)
# parse the raw HTML file
soup = BeautifulSoup(html, "html.parser")
# initialize an empty list to store all scraped data
zillow_property_data = []
# ... find and extract the address ... #
# select h1 and extract its text content
address = soup.find("h1").text.strip()
# ... locate and get the price ... #
# select span with class price-text and extract its text content
price = soup.find("span", {"class":"price-text"}).text.strip()
# ... locate and get the area (sqft) ... #
# select all identified span elements
spans = soup.find_all("span", {"class":"Text-c11n-8-100-1__sc-aiai24-0 hdp__sc-6k0go5-3 jbRdkh llcOCk"})
# access the span at the desired position
position = 2
if len(spans) > position:
area = spans[position].text.strip()
else:
print("Area not found")
# ... locate and extract room details ... #
# create an empty list to store room data
room_data = []
# select the container div
container = soup.find("div", {"class": "styles__StyledCategories-fshdp-8-100-2__sc-1mj0p8k-0 bNUjPE"})
# within the container, find all div elements
room_details = container.find_all("div")
# loop through each div and extract the heading and room features
for room_detail in room_details:
# extract the heading
heading = room_detail.find('h6').text.strip()
# extract the room features
details = [item.text.strip() for item in room_detail.find_all('li')]
room_data.append({
"heading": heading,
"details": details
})
# ... extract hero images ... #
# create an empty list to store image data
image_data = []
# select all identified <span> tags
spans = soup.find_all("span", {"data-testid": "hero-carousel-picture"})
# loop through each span and extract each image's src attributes.
for span in spans:
image_url = span.find("img").get("src")
# append image URL to image_data list
image_data.append(image_url)
# append all scraped data to zillow_property_data list
zillow_property_data.append ({
"Address": address,
"Price": price,
"Area (sqft)": area,
"Room details": room_data,
"Images": image_data
})
# open a CSV file for writing
with open("output.csv", mode="w", newline="", encoding="utf-8") as file:
# create a CSV writer object using DictWriter
writer = csv.DictWriter(file, fieldnames=zillow_property_data[0].keys())
# write the header row to the CSV file
writer.writeheader()
# write the data rows to the CSV file
for data in zillow_property_data:
writer.writerow(data)
This code creates a new output.csv
file and stores the scraped Zillow data in CSV. You'll find this file in your root directory. Here's a shortened screenshot sample for reference.
Step 4: Scraping Multiple Zillow Listings
It's time to scale up a bit. You've seen how Zillow web scraping works on a single property page. If you want data from multiple listings or a search result page, you must account for multiple pages.
Scraping Zillow listings that span multiple pages follows a similar approach to those applied in the previous steps but with the addition of a process for handling pagination. You need to be able to navigate through each page and extract all the necessary data.
Here's a step-by-step guide on how to achieve that.
For this tutorial, we'll scrape Zillow search results based on the geographical location details. Also, we'll extract only the address for each listing.
Let's start by defining the base URL, which is your initial search result page.
Next, create a function to extract addresses from the current page using techniques similar to previous steps.
As before, inspect a listing to identify the target elements and selectors.
You'll find that each listing is an <article>
tag, and each address is within an address tag.
Using this information, create a function to extract addresses.
# import the required libraries
# ...
# define base URL
base_url = "https://www.zillow.com/brooklyn-new-york-ny/"
# function to scrape addresses from a single page
def extract_addresses(soup):
# initialize an empty list to store address data
addresses = []
# find all addresses on current page
address_elements= soup.find_all("address")
for address in address_elements:
# append address to addresses
addresses.append(address.text.strip())
return addresses
This will return a list of addresses on the current page.
Next, create a function to handle pagination. To achieve this, browse through the page to identify the page's pagination structure. If you scroll to the bottom of the page, you'll find arrows for navigating the search result pages.
Starting from the initial search result, you can continuously click the forward arrow to access the next pages. In other words, the new page URL is the href
attribute of the Next arrow button.
Therefore, to create this function, locate the Next arrow button and extract its href
attribute. Again, inspect the page's pagination to identify the right selectors.
You'll notice that the next arrow is an anchor tag with its title
attribute set to Next page
.
Using this information, here's the function for handling pagination.
#...
# function to handle pagination
def get_next_page_url(soup):
# identify the next arrow button and retrieve its href attribute
next_arrow_button = soup.find("a", {"title": "Next page"})
# check if the next page exists else, return none
if next_arrow_button and "href" in next_arrow_button.attrs:
return next_arrow_button["href"]
return None
This function retrieves the href
attribute of the next arrow button on the current page and also checks if the next page is exhausted to stop scraping.
Next, using both functions above, create another function to extract addresses from each search result page.
This function will scrape the current page, navigate to the next page, and repeat scraping until "Next page" is exhausted. It'll also add a slight time delay between scraping pages to avoid triggering antibot restrictions.
Also, the href
attributes of the next arrow buttons are relative paths, so you have to concatenate with https://www.zillow.com
to get a complete URL.
# import the required libraries
#...
import time
# ...
# function to extract addresses from each search result page
def scrape_zillow_pages(base_url):
current_url = base_url
# initialize an empty list to store all addresses
all_addresses = []
# make a GET request to current page
while current_url:
print(f"Scraping: {current_url}")
response = requests.get(current_url)
# parse the raw HTML of current page
soup = BeautifulSoup(response.text, "html.parser")
# call the extract_addresses function to extract addresses from current page
addresses = extract_addresses(soup)
# add addresses to the all_addresses list after scraping current page
all_addresses.extend(addresses)
print(f"Found {len(addresses)} addresses on this page.")
# get the URL for the next page and stop scraping if next page is none
next_page = get_next_page_url(soup)
if next_page:
# concatenate relative paths with https://www.zillow.com
current_url = f"https://www.zillow.com{next_page}"
else:
print("No more pages to scrape.")
break
# add time delay after scraping each page.
time.sleep(2)
return all_addresses
Lastly, call the scrape_zillow_pages()
function to start the scraping process.
# ...
# call scrape_zillow_pages() to start the scraping process
addresses = scrape_zillow_pages(base_url)
That's it.
Now, put all the steps together and handle your output.
# import the required libraries
import requests
from bs4 import BeautifulSoup
import csv
import time
# define base URL
base_url = "https://www.zillow.com/brooklyn-new-york-ny/"
# function to scrape addresses from a single page
def extract_addresses(soup):
# initialize an empty list to store address data
addresses = []
# find all addresses on current page
address_elements= soup.find_all("address")
for address in address_elements:
# append address to addresses
addresses.append(address.text.strip())
return addresses
# function to handle pagination
def get_next_page_url(soup):
# identify the next arrow button and retrieve its href attribute
next_arrow_button = soup.find("a", {"title": "Next page"})
# check if the next page exists else, return none
if next_arrow_button and "href" in next_arrow_button.attrs:
return next_arrow_button["href"]
return None
# function to extract addresses from each search result page
def scrape_zillow_pages(base_url):
current_url = base_url
# initialize an empty list to store all addresses
all_addresses = []
# make a GET request to current page
while current_url:
print(f"Scraping: {current_url}")
response = requests.get(current_url)
# parse the raw HTML of current page
soup = BeautifulSoup(response.text, "html.parser")
# call the extract_addresses function to extract addresses from current page
addresses = extract_addresses(soup)
# add addresses to all_addresses list after scraping current page
all_addresses.extend(addresses)
print(f"Found {len(addresses)} addresses on this page.")
# get the URL for the next page and stop scraping when no more next pages
next_page = get_next_page_url(soup)
if next_page:
# concatenate relative paths with https://www.zillow.com
current_url = f"https://www.zillow.com{next_page}"
else:
print("No more pages to scrape.")
break
# add time delay after scraping each page.
time.sleep(2)
return all_addresses
# start the scraping process
addresses = scrape_zillow_pages(base_url)
# export to CSV
with open("zillow_addresses.csv", mode="w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(["Address"])
for address in addresses:
writer.writerow([address])
This code scrapes all the search result pages and stores the output in a CSV file. Here's the result:
Congratulations! You now have a Zillow scraper for scraping multiple pages.
Easiest Solution to Scrape Zillow
Zillow uses advanced anti-bot detection techniques that can block your requests and deny you access. If you're facing this challenge, the ZenRows' Zillow Scraper API offers a reliable and foolproof solution.
This tool provides everything you need to avoid detection while web scraping Zillow.
With features like premium proxies, JavaScript rendering, fingerprinting evasion, actual user spoofing, request header management, advanced anti-bot bypass, and more, ZenRows allows you to focus on extracting your desired data rather than the intricacies of circumventing anti-bot solutions.
Some additional benefits of using the ZenRows Zillow Scraper API include:
✅ Extract data from millions of properties around the US with a few lines of code.
✅ Download structured data in JSON and easily store it in a usable format, such as CSV.
✅ Get automatically generated datasets in CSV format.
✅ Quickly scrape multiple pages.
To use this tool, sign up to get your free API key.
You'll be redirected to the Request Builder page, where your ZenRows API key is at the top right.
Input your target URL and activate Premium Proxies and JS Rendering boost mode.
Then, select the Python language and choose the API option. ZenRows works with any language and provides ready-to-use snippets for the most popular ones.
Remember to select your desired output format. The autoparse
option parses the HTML and returns a JSON result.
Lastly, copy the generated code on the right to your editor for testing.
Using the same target property page as before, your code should look like this:
import requests
url = 'https://www.zillow.com/homedetails/722-N-Trumbull-Ave-Chicago-IL-60624/3810331_zpid/'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
'url': url,
'apikey': apikey,
'js_render': 'true',
'premium_proxy': 'true',
'autoparse': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)
This code bypasses Zillow's anti-bot restrictions, retrieves the HTML, and automatically parses it to return the following JSON result with multiple fields.
{
"bathrooms": 3,
"bedrooms": 5,
"city": "Chicago",
"cityId": 17426,
"country": "USA",
"currency": "USD",
"datePosted": "",
"daysOnZillow": 13,
"description": "A stunning new construction in the Humboldt Park ...",
# ... omitted for brevity ... #
}
Awesome! You're now well-equipped for web scraping Zillow at scale.
ConclusionÂ
You've learned how to scrape a single Zillow page, extract information from multiple pages, and also store scraped data in a structured format. By combining these techniques, you're well-equipped to take advantage of Zillow's vast property database.
Just keep in mind that frequent layout changes and anti-bot restrictions can make web scraping Zillow challenging.
Luckily, ZenRows addresses these obstacles, allowing you to focus on extracting your desired data. For a hassle-free Zillow scraping, try ZenRows now.