Idealista is one of the biggest real estate marketplaces in Spain, Italy, and Portugal. In real estate, having real-time data at your fingertips can make all the difference. Whether you're looking for investment opportunities or a competitive edge, Idealista web scraping can provide you with all the data you need.
In this guide, we'll walk you through scraping Idealista in a few steps. You'll also learn strategies to avoid detection, handle pagination, and store scraped data in a structured format.
- Step 1: Prerequisites.
- Step 2: Scrape Idealista property data.
- Step 3: Export scraped Idealista property data to CSV.
- Scraping multiple Idealista listings.
- Easiest Solution to scrape Idealista.
Step 1: Prerequisites
Before we start web scraping, ensure you meet the following prerequisites.
- Python.
- Requests.
- BeautifulSoup.
Here are steps to get everything in place.
Run the following command in your terminal to verify your Python installation.
python --version
If Python is installed on your machine, this command will return its version, as in the example below.
Python 3.13.0
Next, install the Python Requests and BeautifulSoup libraries using pip.
pip3 install requests beautifulsoup4
That's it. You're all set up.
If you haven't already, navigate to the directory where you'd like to store your code, create and open a Python file (scraper.py
) using your preferred IDE. Then, get ready to write some code.
Step 2: Scrape Idealista Property Data
For demonstration purposes, we'll scrape the following Idealista property page.
After retrieving the target page's raw HTML, we'll extract the following data points:
- Address.
- Price.
- Description.
- Room details.
- Image.
Each step will break down how to identify target HTML elements and retrieve the necessary information.
By the end of this tutorial, you'll be well-equipped to scrape Idealista at scale and store scraped data in a structured format.
To begin, here's a basic script to retrieve the raw HTML of the target page.
# pip3 install requests beautifulsoup4
import requests
# define target URL
url = "https://www.idealista.com/inmueble/105043094/"
# make a GET request to the target URL
response = requests.get(url)
# retrieve HTML content
html = response.text
print(html)
This code makes a GET
request to Idealista and returns the full HTML, as shown below.
<html lang="es" env="es" username="" data-userauth="false" class="">
<head>
<! -- ... -->
<title>
Piso en venta en paseo de Grà cia, La Dreta de l'Eixample, Barcelona -- idealista
</title>
<! -- ... -->
</head>
</html>
However, you'll most likely get Idealista's CAPTCHA page asking you to prove you're human. This is because the website's anti-bot restriction is blocking your request and denying you access to its data.
To overcome this challenge, you can use often recommended configurations like specifying proxies and custom user agents. If those fail, we've preserved a more reliable and foolproof method in a later section.
Find and Extract the Address
With the raw HTML ready, you can extract valuable information, like a property address. But to do that, you must first parse the HTML file.
Thus, import the BeautifulSoup library and create a soup
object.
# import the required libraries
# ...
from bs4 import BeautifulSoup
# ...
# parse the raw HTML file
soup = BeautifulSoup(html, "html.parser")
This code creates a structured representation of the HTML file, allowing you to navigate the Document Object Model (DOM) to extract specific data points.
Only now can you find and extract the address using the steps below.
Start by inspecting the page to identify the HTML element containing the address: Visit the target website in a browser, right-click on the address, and select inspect.
You'll notice that the property's address is within the <span>
element with class main-info__title-minor
.
Applying this information, use BeautifulSoup's find()
method to locate the identified <span>
element and extract its text content (the address).
# ...
# ... find and extract the address ... #
# select span element and extract its text content
address = soup.find("span", {"class": "main-info__title-minor"}).text.strip()
print(f"Address: {address}")
This code prints the address to your terminal, as shown below.
Address: La Dreta de l'Eixample, Barcelona
Locate and Get the Price
The next key data point is the property's price. Using inspection techniques similar to those in the previous section, locate the element containing this data point.
The price is within a <span>
element with class, info-data-price
.
As you did with the address, locate the span element and extract its text content.
# ...
# ... locate and get the price ... #
# select span element and extract its text content
price = soup.find("span", {"class": "info-data-price"}).text.strip()
print(f"Price: {price}")
This will log the property's price to your terminal.
Price: 1.350.000 €
Extract Room Details
The room details are structured as list items with headings. To extract this data, inspect the page to identify the right selectors.
You'll notice they're within a parent <div>
container with class details-property
. The headings are <h2>
tags, while the features are <li>
items nested in <div>
blocks with class details-property_features
.
Using this information, select the parent <div>
container. Then, within this container, find all <h2>
tags and their corresponding feature list items. (<div>
blocks with class details-property_features
).
As a good coding practice, check that a selection, in this case, details_container
, exists before using it to access others. This is pivotal to avoid errors.
# ...
# ... extract room details ... #
# initialize an empty list to store room details
room_details = []
# select the main container div
details_container = soup.find("div", {"class": "details-property"})
if details_container:
# find all h2s and the corresponding feature divs
headings = details_container.find_all("h2")
features = details_container.find_all("div", {"class": "details-property_features"})
Then, iterate through the features and headings together and extract their text content.
# ...
# iterate through headings and features
for heading, feature in zip(headings, features):
property_feature = [li.text.strip() for li in feature.find_all("li")]
room_details.append({
"heading": heading.text.strip(),
"features": property_feature
})
# log the result
print(room_details)
Now, combine the above steps to get the complete code for extracting room details.
# ...
# ... extract room details ... #
# initialize an empty list to store room details
room_details = []
# select the main container div
details_container = soup.find("div", {"class": "details-property"})
if details_container:
# find all h2s and the corresponding feature divs
headings = details_container.find_all("h2")
features = details_container.find_all("div", {"class": "details-property_features"})
# iterate through headings and features
for heading, feature in zip(headings, features):
property_feature = [li.text.strip() for li in feature.find_all("li")]
room_details.append({
"heading": heading.text.strip(),
"features": property_feature
})
# log the result
print(room_details)
This code extracts the room details into a list and prints the list below.
[
{'heading': 'Basic features', 'features': ['115 m² built', '3 bedrooms', '2 bathrooms', 'Second hand/good condition', 'Storage room', 'North, east orientation', 'Built in 1920', 'Individual heating']},
# ... truncated for brevity ... #
]
Locate and Extract Property Description
The property description is found in the Advertiser's Comment section. As in previous steps, inspect the page to identify the HTML elements containing the desired data.
You'll find that the description is within a <p>
tag nested in a <div>
container with class comment
.
Using this information, select the <div>
container, find the <p>
tag within the container, and extract its text content.
# ...
# ... locate and extract property description ... #
# select div container
description_container = soup.find("div", {"class": "comment"})
if description_container:
# find <p> tag within the container and extract its text content
description = description_container.find("p").text.strip()
print(f"Property Description: {description}")
Here's the result:
Property Description: This luxury apartment is located on Paseo de Gracia, in a building designed by the prestigious architect Josep Puig i Cadafalch,...
# ... truncated for brevity ... #
Locate and Get Property Images
You can find the property images in the Photos section of the page. To extract their URL, inspect the page to identify the HTML elements containing the images.
The images are inside <img>
tags within a <div>
with class placeholder-multimedia
These selectors on the Idealista property page are dynamic and may change at the time of reading. Thus, when following this tutorial, ensure you double-check and update accordingly.
Using this information, select all the image divs, loop through each, and extract the images src
attributes.
# ...
# ... locate and get property images ... #
# create an empty list to store image data
image_data = []
# select images div container
image_containers = soup.find_all("div", {"class": "placeholder-multimedia"})
if image_containers:
# loop through each div, select the img tags, and extract the src attribute
for container in image_containers:
image = container.find("img")
image_data.append(image.get("src"))
print(image_data)
This code stores all image URLs in a list and outputs the following result to your console.
[
'https://img4.idealista.com/blur/WEB_DETAIL-XL-L/0/id.pro.es.image.master/9a/b9/f3/1240865377.jpg', 'https://img4.idealista.com/blur/WEB_DETAIL-XL-L/0/id.pro.es.image.master/67/71/57/1240865413.jpg', 'https://img4.idealista.com/blur/WEB_DETAIL-XL-L/0/id.pro.es.image.master/85/72/c8/1240865408.jpg',
# ... truncated for brevity ... #
]
Step 3: Export Scraped Idealista Property Data to CSV
It's often easier to analyze scraped data when they're stored in a structured format. Python's csv
module allows you to do that in CSV.
Start by organizing the scraped data. Create an empty property data list and append all the scraped data to that list. Remember to import Python's csv
module.
# import the required libraries
# ...
import csv
# ...
# initialize an empty list to store all scraped data
property_data = []
# ...
# append scraped data to property_data list
property_data.append ({
"Address": address,
"Price": price,
"Description": description,
"Room details": room_details,
"Images": image_data
})
Next, open a CSV file in write mode, create a DictWriter
object, and define the field names. property_data
is a list of dictionaries, meaning you can use the keys of each dictionary to set the column headers.
# ...
# ... export scraped data to CSV ... #
# open a CSV file for writing
with open("outputs.csv", mode="w", newline="", encoding="utf-8") as file:
# create a CSV writer object using DictWriter
writer = csv.DictWriter(file, fieldnames=property_data[0].keys())
Lastly, write the header row and data row.
# ...
# write the header row to the CSV file
writer.writeheader()
# write the data rows to the CSV file
for data in property_data:
writer.writerow(data)
That's it.
Now, modify your code snippets above to get the following complete code.
# import the required libraries
import requests
from bs4 import BeautifulSoup
import csv
# define target URL
url = "https://www.idealista.com/inmueble/105043094/"
# make a GET request to the target URL
response = requests.get(url)
# retrieve HTML content
html = (response.text)
# parse the raw HTML file
soup = BeautifulSoup(html, "html.parser")
# initialize an empty list to store all scraped data
property_data = []
# ... find and extract the address ... #
# select the span element containing address and extract its text content
address = soup.find("span", {"class": "main-info__title-minor"}).text.strip()
# ... locate and get the price ... #
# select span element containing price and extract its text content
price = soup.find("span", {"class": "info-data-price"}).text.strip()
# ... locate and extract room details ... #
# initialize an empty list to store room details
room_details = []
# select the main container div
details_container = soup.find("div", {"class": "details-property"})
if details_container:
# find all h2s and the corresponding feature divs
headings = details_container.find_all("h2")
features = details_container.find_all("div", {"class": "details-property_features"})
# iterate through headings and features
for heading, feature in zip(headings, features):
property_feature = [li.text.strip() for li in feature.find_all("li")]
room_details.append({
"heading": heading.text.strip(),
"features": property_feature
})
# ... locate and extract property description ... #
# select description div container
description_container = soup.find("div", {"class": "comment"})
if description_container:
# find <p> tag within the container and extract its text content
description = description_container.find("p").text.strip()
# ... locate and get property images ... #
# create an empty list to store image data
image_data = []
# select images div container
image_containers = soup.find_all("div", {"class": "placeholder-multimedia"})
if image_containers:
# loop through each div
for container in image_containers:
# select the img tags, and extract the src attribute
image = container.find("img")
image_data.append(image.get("src"))
# append scraped data to property_data list
property_data.append ({
"Address": address,
"Price": price,
"Description": description,
"Room details": room_details,
"Images": image_data
})
# ... export scraped data to CSV ... #
# open a CSV file for writing
with open("outputs.csv", mode="w", newline="", encoding="utf-8") as file:
# create a CSV writer object using DictWriter
writer = csv.DictWriter(file, fieldnames=property_data[0].keys())
# write the header row to the CSV file
writer.writeheader()
# write the data rows to the CSV file
for data in property_data:
writer.writerow(data)
This code scrapes the Idealista property target page and creates a new outputs.csv file in your project root directory, where it stores the scraped data in CSV format.
Here's a truncated screenshot sample for reference:
Scraping Multiple Idealista Listings
If you're interested in hundreds or thousands of Idealista listings, scraping multiple pages is essential. This process is similar to those applied in the previous steps but with the need to handle pagination.
You have to navigate through each page and extract the necessary data. To achieve that, you must first understand the website's pagination structure.
Let's break it down into steps.
For this tutorial, we'll scrape Idealista listings based on geographical location. Also, we'll extract only the links for each listing to keep things simple.
Let's start by defining the base URL, which is your initial search result page.
Next, create a function to extract links from the current page using techniques similar to previous steps.
As before, inspect a listing to identify the target elements and selectors.
You'll find that each listing is an <article>
tag, and the links are the href
attributes of the anchor tags within each article.
Using this information, create a function to extract links. This function will select all anchor tags, loop through each, and extract their href attribute.
The href attributes are relative paths. You must concatenate with https://www.idealista.com
to get the complete URL.
# import the required libraries
# ...
# define base URL
base_url = "https://www.idealista.com/en/venta-viviendas/barcelona/eixample/"
# function to scrape addresses from a single page
def extract_links(soup):
# initialize an empty list to store links
links = []
# select all listings on current page
listings = soup.find_all("article")
# loop through listing and retrieve links
for listing in listings:
# find the <a> tag and ensure it exists within the href
a_tag = listing.find("a")
if a_tag and a_tag.get("href"):
# extract link, construct the full URL and append to the links list
link = a_tag.get("href")
links.append(f"https://www.idealista.com{link}")
return links
This will return a list of links on the current page.
Next, create a function to handle pagination. To achieve this, browse through the page to identify the page's pagination structure.
You'll find the Next button at the bottom of the page to navigate the search result pages.
Starting from the initial search result, you can continuously click the Next button to access the following pages. In other words, the new page URL is the href
attribute of the Next button.
Therefore, to create this function, locate the Next button and extract its href
attribute. Again, inspect the page's pagination to identify the right selectors.
You'll notice that the Next button is an <li>
element with class next
, and its anchor tag has a class icon-arrow-right-after
.
Also, the href
attribute here are relative paths. Hold onto this information; we'll use it in a later function.
Now, retrieve the next page link using the details above and check if the "Next page" is exhausted.
#...
# function to handle pagination
def get_next_page_url(soup):
# identify the next button and retrieve its href attribute
next_button = soup.find("a", {"class": "icon-arrow-right-after"})
# check if the next page exists else, return none
if next_button and "href" in next_button.attrs:
return next_button["href"]
return None
Next, using both functions above, create another function to extract links from each search result page.
In this function, scrape the current page, navigate to the next page, and repeat scraping until "Next page" is exhausted.
Also, add a small time delay between scraping pages to avoid triggering anti-bot restrictions. Remember that the href
attributes of the Next button are relative paths, so concatenate with https://www.idealista.com
to get a complete URL.
# import the required libraries
#...
import time
# ...
# function to extract addresses from each search result page
def scrape_idealista_pages(base_url):
current_url = base_url
all_links = []
# make a GET request to current page
while current_url:
print(f"Scraping: {current_url}")
response = requests.get(current_url)
# parse the raw HTML of current page
soup = BeautifulSoup(response.text, "html.parser")
# call the extract_links function to scrape links from current page
addresses = extract_addresses(soup)
# add links to the all_links list after scraping current page
all_links.extend(links)
print(f"Found {len(links)} links on this page.")
# get the URL for the next page and stop scraping if no more next page
next_page = get_next_page_url(soup)
if next_page:
# concatenate relative paths with Idealista base URL
current_url = f"https://www.idealista.com{next_page}"
else:
print("No more pages to scrape.")
break
# add time delay after scraping each page.
time.sleep(2)
return all_links
Lastly, call the scrape_idealista_pages()
function to start the scraping process.
# ...
# start the scraping process
links = scrape_idealista_pages(base_url)
That's it.
Now, put all the steps together and handle your input.
# import the required libraries
import requests
from bs4 import BeautifulSoup
import csv
import time
# define base URL
base_url = "https://www.idealista.com/en/venta-viviendas/barcelona/eixample/"
# function to scrape addresses from a single page
def extract_links(soup):
# initialize an empty list to store links
links = []
# select all listings on current page
listings = soup.find_all("article")
# loop through listing and retrieve links
for listing in listings:
# find the <a> tag and ensure it exists within the href
a_tag = listing.find("a")
if a_tag and a_tag.get("href"):
# extract link, construct the full URL and append to the links list
link = a_tag.get("href")
links.append(f"https://www.idealista.com{link}")
return links
# function to handle pagination
def get_next_page_url(soup):
# identify the next button and retrieve its href attribute
next_button = soup.find("a", {"class": "icon-arrow-right-after"})
# check if the next page exists else, return none
if next_button and "href" in next_button.attrs:
return next_button["href"]
return None
# function to extract addresses from each search result page
def scrape_idealista_pages(base_url):
current_url = base_url
all_links = []
# make a GET request to current page
while current_url:
print(f"Scraping: {current_url}")
response = requests.get(current_url)
# parse the raw HTML of current page
soup = BeautifulSoup(response.text, "html.parser")
# call the extract_links function to scrape links from current page
links = extract_links(soup)
# add links to the all_links list after scraping current page
all_links.extend(links)
print(f"Found {len(links)} links on this page.")
# get the URL for the next page and stop scraping if no more next page
next_page = get_next_page_url(soup)
if next_page:
# concatenate relative paths with Idealista base URL
current_url = f"https://www.idealista.com{next_page}"
else:
print("No more pages to scrape.")
break
# add time delay after scraping each page.
time.sleep(2)
return all_links
# start the scraping process
links = scrape_idealista_pages(base_url)
# export to CSV
with open("listing_links.csv", mode="w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(["link"])
for link in links:
writer.writerow([link])
Here's the result:
Congratulations! You now have an Idealista scraper capable of scraping multiple pages.
Easiest Solution to Scrape Idealista
Idealista uses advanced anti-bot detection techniques that can block your requests and deny you access. If you're facing this challenge, the ZenRows' Scraper API offers a reliable and foolproof solution.
This tool provides everything you need to avoid detection while web scraping Idealista.
With features like premium proxies, JavaScript rendering, request header management, advanced anti-bot bypass, fingerprint spoofing, and more, ZenRows allows you to focus on extracting your desired data rather than the intricacies of circumventing anti-bot solutions.
Some additional benefits of using the ZenRows Scraper API include:
✅ 98.7% average success rate.
✅ Download structured data in JSON and easily store it in a usable format, such as CSV.
✅ Easy integration and 24/7 support.
✅ Scrape multiple pages at scale.
To use this tool, sign up to get your free API key.
You'll be redirected to the Request Builder page, where your ZenRows API key is at the top right.
Input your target URL and activate the Premium Proxies and JS Rendering mode.
Then, select the Python language option and choose the API option. ZenRows works with any language and provides ready-to-use snippets for the most popular ones.
Remember to select your desired output format. The autoparse
option parses the HTML and returns a JSON result.
Lastly, copy the generated code on the right to your editor to test your code.
Using the same target property page as before, your code should look like this:
import requests
url = 'https://www.idealista.com/inmueble/105043094/'
apikey = '<YOUR_ZENROWS_API_KEY>'
params = {
'url': url,
'apikey': apikey,
'js_render': 'true',
'premium_proxy': 'true',
'autoparse': 'true',
}
response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)
This code bypasses Idealista's anti-bot restrictions, retrieves the HTML, and automatically parses it to return the following JSON result with multiple fields.
[
# ...
{
"price": "1350000",
"characteristics": {
"roomNumber": "3",
"bathNumber": "2",
"hasLift": "1",
"hasParking": "0",
"hasGarden": "0",
"hasSwimmingPool": "0",
"hasTerrace": "0",
"constructedArea": "115"
},
# ... omitted for brevity ... #
]
Awesome! You're now well-equipped for web scraping Idealista at scale.
ConclusionÂ
Scraping Idealista allows you to access critical data insights efficiently. While its anti-bot protections, dynamic HTML, and frequent layout changes can pose challenges, the right tools can make the process much smoother.
ZenRows simplifies Idealista web scraping by handling anti-bot measures and adapting to dynamic content effortlessly. For a seamless scraping experience, explore ZenRows today!