How to Export Data to CSV

Once data is extracted using ZenRows, how can you store it? We will look at transforming JSON or HTML outputs into CSV files. We will get only one URL and store information in a single file for simplicity. In real-world cases, you might prefer to extract several URLs and store all the results together.

From JSON using Python

The first case will be storing a JSON output obtained via ZenRows with the "autoparse" feature enabled. We chose Zillow for the example, but autoparse works for many sites like YouTube, Instagram, or Yelp. We will be using the Pandas library to convert (json_normalize) and save (to_csv) the file in CSV format. The "normalize" function will flatten the given data since the JSON response might have nested attributes.

# pip install requests pandas 
import requests 
import json 
import pandas as pd 
 
url = "https://www.zillow.com/san-francisco-ca/" 
apikey = "YOUR_KEY" 
params = {"autoparse": True, "url": url, "apikey": apikey} 
response = requests.get("https://api.zenrows.com/v1/", params=params) 
 
content = json.loads(response.text) 
 
data = pd.json_normalize(content) 
data.to_csv("result_zillow.csv", index=False)

We can also provide parameters to control the number of nested levels that will flatten and rename some fields. In this case, we flattened only one inner level and removed latLong from latitude and longitude fields. It is a typical case, for example, when scraping Zillow.

data = pd.json_normalize(content, max_level=1).rename( 
	columns=lambda x: x.replace("latLong.", ""))

From HTML using Python

For this scenario, we picked AutoScout24 without autoparse. The API call will return plain HTML, and we are responsible for parsing it using BeautifulSoup. We create a dictionary for each car with basic info, such as make or price. Then, pass the array of dictionaries to Pandas to build a DataFrame. And that object will expose a function to store the data as CSV, just as in the case above. For this example, there is no need to flatten any structure since we create the dictionaries and know that they have a single level.

# pip install requests beautifulsoup4 pandas 
import requests 
from bs4 import BeautifulSoup 
import pandas as pd 
 
url = "https://www.autoscout24.com/lst/?sort=age&desc=1" 
apikey = "YOUR_KEY" 
params = {"url": url, "apikey": apikey} 
response = requests.get("https://api.zenrows.com/v1/", params=params) 
soup = BeautifulSoup(response.content, "html.parser") 
 
content = [{ 
	"makemodel": car.select_one(".cldt-summary-makemodel").text, 
	"version": car.select_one(".cldt-summary-version").text, 
	"price": car.select_one('[data-item-name="price"]').text.split('-')[0].strip(), 
	"link": car.select_one('[data-item-name="detail-page-link"]').get("href"), 
} for car in soup.select(".cl-list-element.cl-list-element-gap")] 
 
data = pd.DataFrame(content) 
data.to_csv("result_autoscout24.csv", index=False)

From JSON using Javascript

Switching to Javascript and Node.js, we will do the same examples. We will need to install the library json2csv to handle the conversion from JSON to CSV. The fs module comes with Node and allows interacting with the file system.

After getting the data, we will parse it with a flatten transformer. As the name implies, it will flatten the nested structures inside the JSON. Then, save the file using writeFileSync.

// npm install zenrows json2csv 
const fs = require("fs"); 
const { 
	Parser, 
	transforms: { flatten }, 
} = require("json2csv"); 
const { ZenRows } = require("zenrows"); 
 
(async () => { 
	const client = new ZenRows("YOUR_KEY"); 
	const url = "https://www.zillow.com/san-francisco-ca/"; 
 
	const { data } = await client.get(url, { autoparse: "true" }); 
 
	const parser = new Parser({ transforms: [flatten()] }); 
	const csv = parser.parse(data); 
 
	fs.writeFileSync("result_zillow.csv", csv); 
})();

From HTML using Javascript

As with the Python example, we will use AutoScout24 to extract data from HTML without the autoparse feature. For that, we will get the plain result and load it into cheerio. It will allow us to query elements as we would in the browser or with jQuery. We will return an object with essential data for each car entry in the list. Parse that list into CSV using json2csv, and no flatten is needed this time. And lastly, store the result. These last two steps are similar to the autoparse case.

// npm install zenrows json2csv cheerio 
const fs = require("fs"); 
const cheerio = require("cheerio"); 
const { Parser } = require("json2csv"); 
const { ZenRows } = require("zenrows"); 
 
(async () => { 
	const client = new ZenRows("YOUR_KEY"); 
	const url = "https://www.autoscout24.com/lst/?sort=age&desc=1"; 
 
	const { data } = await client.get(url); 
	const $ = cheerio.load(data); 
 
	const content = $(".cl-list-element.cl-list-element-gap").map((_, car) => ({ 
		makemodel: $(car).find(".cldt-summary-makemodel").text(), 
		version: $(car).find(".cldt-summary-version").text(), 
		price: $(car).find("[data-item-name='price']").text().split("-")[0].trim(), 
		link: $(car).find("[data-item-name='detail-page-link']").attr("href"), 
	})) 
	.toArray(); 
 
	const parser = new Parser(); 
	const csv = parser.parse(content); 
 
	fs.writeFileSync("result_autoscout24.csv", csv); 
})();

If there is any problem or you cannot correctly set up your scraper, contact us and we'll help you.