Web scraping with Axios and Cheerio

Web scraping is the process of extracting content and data from a website. This is known as scraping. There are innumerable reasons for scraping a website. Today we'll learn how to do it with the help of Axios, one awesome HTTP client that works for Node.js and the browser.

What does Axios do?

Axios allows you to make requests to websites and servers in a way similar to how a browser does it. But instead of presenting the results visually, Axios allows you to manipulate the response using code. This is very useful in the context of web scraping.

In this tutorial, we'll scrape the contents of ScrapeMe, an e-commerce designed to be scraped, to learn the process. Specifically, we will extract the names of some products and their prices. By using the technique you will learn in this article, you can apply the power of Axios web scraping to many websites.

ScrapeMe Site
The product list we'll scrape

Before getting started

What is the purpose of Axios web scraping?

Sometimes, you can get the data you need from a web API in a structured way, for example in a JSON format. On many occasions, the only way to access certain data is by getting it from a public website. This can be a costly and time-consuming task, but Axios web scraping allows you to automate this process so you can get the data and content from a website efficiently.

If you are a frontend developer that has used Axios for ages, you may wonder if it is possible to use Axios to scrape a website. That's actually a great idea because it runs on the browser and on Node.js, has great support for Typescript, has solid documentation, and there are lots of examples on the web.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Initial setup

Create a new folder for your Axios web scraping project. I named it scraper, but I'm sure you can come up with a more imaginative name. Open your terminal in that folder, and execute the following commands to set up a new npm package and install Axios on it.

npm init -y 
npm install axios

Using your favorite IDE or code editor, create a new file named index.js at the root of that folder and paste the following code on it:

const axios = require('axios'); 
axios.get('https://scrapeme.live/shop/') 
	.then(({ data }) => console.log(data));

We need to tweak our package.json file to be able to run our code. Inside the scripts section, we will add a new script to be able to run our index.js file. Your package.json file should look similar to this:

{ 
	"name": "scraper", 
	"version": "1.0.0", 
	"description": "", 
	"main": "index.js", 
	"scripts": { 
		"test": "echo \"Error: no test specified\" && exit 1", 
		"start": "node index.js" 
	}, 
	"keywords": [], 
	"author": "", 
	"license": "ISC", 
	"dependencies": { 
		"axios": "^0.27.2" 
	} 
}

Type npm start on your terminal to run the code. If it works, you will see all the HTML of the webpage printed on your terminal.

HTML Output
Click to open the image in fullscreen

That was easy! But not very useful because the information we want is all scrambled with the other contents of the site. This is where Cheerio comes in handy. It's a library that provides us with an efficient implementation of core jQuery designed specifically for the server.

What does Cheerio do?

In the context of Axios web scraping, Cheerio is useful to select specific HTML elements and extract their information. Then you can organize or transform that information according to your needs. In our case, we will use it to get the names and prices of the products on our target website.

Using Cheerio with Axios

First, install it by running the following command in the same folder that we have been using:

npm install cheerio

Now we will need to tell cheerios what piece of information we are interested in. For that, inspect the webpage using your browser developer tools, for example using Chrome:

ScrapeMe Product Example
Click to open the image in fullscreen

As you can see, the elements that contain the names of the products use the class attribute woocommerce-loop-product__title. Let's modify our index.js file so it looks like this to select those elements.

const axios = require('axios'); 
const cheerio = require('cheerio'); 
 
axios.get('https://scrapeme.live/shop/') 
	.then(({ data }) => { 
		const $ = cheerio.load(data); 
 
		const pokemonNames = $('.woocommerce-loop-product__title') 
			.map((_, product) => { 
				const $product = $(product); 
				return $product.text() 
			}) 
			.toArray(); 
		console.log(pokemonNames) 
	});

Are you ready to run the code that will allow you to do your first Axios web scraping? If the answer is yes, run the code, and if everything went right, you will see a nice list of products on the screen.

ScrapeMe Product Names Output
Click to open the image in fullscreen

Let's modify our code a little to get also the prices of the products. First, we will target a parent element in the DOM that contains the name and the price of the product. li.product seems to be enough.

In this case, the class attribute that we want is woocommerce-Price-amount, so we should add a piece of code to our index.js file that looks similar to this:

const axios = require('axios'); 
const cheerio = require('cheerio'); 
 
axios.get('https://scrapeme.live/shop/') 
	.then(({ data }) => { 
		const $ = cheerio.load(data); 
 
		const pokemons = $('li.product') 
			.map((_, pokemon) => { 
				const $pokemon = $(pokemon); 
				const name = $pokemon.find('.woocommerce-loop-product__title').text() 
				const price = $pokemon.find('.woocommerce-Price-amount').text() 
				return {'name': name, 'price': price} 
			}) 
			.toArray(); 
		console.log(pokemons) 
	});

Running this will output a nice list of products retrieved using Axios web scraping.

ScrapeMe Product List Output
Click to open the image in fullscreen

What is the website that you're trying to scrape?

The technique we just used works fine for simple websites. However, others will try to block Axios web scraping. In those cases, it is useful to make our requests look similar to those done by actual browsers.

One of the most basic verification that websites do is checking the User-Agent header. This is a string that informs the server about the operating system, vendor, and version of the requesting user agent. There are different ways of writing the user agent string. For example, in my case I got:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36

This "presentation card" informs a website it is being accessed by a Chrome browser version 104 on an intel mac running macOS Catalina.

To send additional headers using Axios, we can pass an additional config parameter to the request method. We can check that Axios is sending these headers using httpbin. So let's create a new file named headers.js in the root folder of our project with the following content:

const axios = require('axios'); 
 
const config = { 
	headers: { 
	'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36', 
	}, 
}; 
 
axios.get('https://httpbin.org/headers', config) 
	.then(({ data }) => { 
		console.log(data) 
	});

We also need to create a new command in our package.json file. You can name it headers like in the following snippet:

{ 
	"name": "scraper", 
	"version": "1.0.0", 
	"description": "", 
	"main": "index.js", 
	"scripts": { 
		"test": "echo \"Error: no test specified\" && exit 1", 
		"start": "node index.js", 
		"headers": "node headers.js" 
	}, 
	"keywords": [], 
	"author": "", 
	"license": "ISC", 
	"dependencies": { 
		"axios": "^0.27.2", 
		"cheerio": "^1.0.0-rc.12" 
	} 
}

Run it on the terminal with the command npm run headers, and the header we sent should appear on the screen:

User Agent Output
Click to open the image in fullscreen

For a better shot of avoiding websites from blocking your script, it's better to send additional headers apart from the user-agent. You can visit httpbin with your browser to check all the headers it is sending. It should return something similar to this:

{ 
	"headers": { 
		"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
		"Accept-Encoding": "gzip, deflate, br", 
		"Accept-Language": "en-US,en;q=0.9", 
		"Host": "httpbin.org", 
		"Sec-Ch-Ua": "\"Chromium\";v=\"104\", \" Not A;Brand\";v=\"99\", \"Google Chrome\";v=\"104\"", 
		"Sec-Ch-Ua-Mobile": "?0", 
		"Sec-Ch-Ua-Platform": "\"macOS\"", 
		"Sec-Fetch-Dest": "document", 
		"Sec-Fetch-Mode": "navigate", 
		"Sec-Fetch-Site": "none", 
		"Sec-Fetch-User": "?1", 
		"Upgrade-Insecure-Requests": "1", 
		"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36", 
		"X-Amzn-Trace-Id": "Root=1-630ce94f-289bc1c678d8cd7153e42f5a" 
	} 
}

You can start adding some of those headers to check if you have good results with the website you are trying to scrape. If you want to check the complete code of the examples we have used, you can find it on this Axios scraping repository.

Avoiding blocks

Keep in mind that header checking is not the only method websites use to identify crawlers and scrapers. For example, if you make too many requests from the same IP address, you can also get blocked or banned. In that case, you can command Axios to use a proxy server that will act as an intermediary between your Axios web scraping script and the website host.

Another common problem is when the target website is a Single Page Application. For example, when you use Axios to scrape a react website. In that case, Axios scrapes the HTML code sent by the server, but that might not contain the data you need because that data is fetched by Javascript code that runs on the browser of the final user. In those cases, using a headless browser or a web scraping API is probably a better solution.

Tip: For more information on how to configure Axios to use a proxy, and how to scrape using a headless browser, read our guide on Web Scraping with Javascript and NodeJS.

Conclusion

As we saw in this tutorial, web scraping can be easy or challenging, depending on the website you are trying to scrape. Sometimes, you just need to:
  1. Fire the request with Axios.
  2. Get the data in a callback.
  3. Select the relevant parts with Cheerios.
  4. Format the information with vanilla Javascript.

But if the target website implements anti-scraping measures, you need to use more advanced techniques. Some of these techniques include adding HTTP headers or using proxies.

If you prefer to focus on generating value from the data you scrape instead of trying to bypass anti-bot systems, try a web scraping API together with Axios.

Zenrows offers a professional and easy-to-use suite of tools for web scraping. The best of it? Every plan includes Automatic Data Extraction, Smart Rotating Proxies, Anti-Bot & CAPTCHA Bypass, and Javascript Rendering, among other useful features. Try it for free now.

Did you find the content helpful? Spread the word and share it on Twitter, LinkedIn, or Facebook.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Want to keep learning?

We will be sharing all the insights we have learned through the years in the following blog posts. If you don't want to miss a piece and keep learning, we'd be thrilled to have us in our newsletter.

No spam guaranteed. You can unsubscribe at any time.