Web Scraping With jQuery: A Complete Tutorial

In this web scraping jQuery tutorial, you'll learn how to build a jQuery web crawler. jQuery is one of the most popular JavaScript libraries. Specifically, jQuery enables HTML document traversal and manipulation.

This makes jQuery the perfect library to crawl web pages to perform web scraping. Here, you'll see if you can use jQuery for client-side scraping. Also, you'll learn how to use jQuery for server-side scraping.

Let's now create a jQuery scraper and achieve your data retrieval goals.

What is client-side scraping?

Client-side scraping involves performing web scraping techniques directly from the browser. In other words, the frontend executes client-side web scraping. Typically, through JavaScript. So, client-side scraping's about retrieving information from the Web in your browser.

You can achieve client-side scraping by calling a public API or by parsing the HTML content of a webpage. Keep in mind that most websites don't offer public APIs. So, you generally have to download HTML documents and parse them to extract data.

Let's now learn how to perform client-side scraping using jQuery!

How do I scrape a web page with jQuery?

First, you need to download the HTML content of your target webpage. Let's learn how to achieve this in jQuery. Specifically, let's fetch the https://google.com/ webpage and get its HTML content.

You can achieve this with the jQuery get() method. get() performs a GET HTTP request and exposes what the server returns in a callback. Use get() as follows:

$.get("https://google.com/", function(html) { 
	console.log(html); 
});

Yet, this snippet won't work! That's because you'll get the No 'Access-Control-Allow-Origin' header is present on the requested resource CORS (Cross-Origin Resource Sharing) error.

This happens because your browser's performing the HTTP request. Modern browsers automatically use the Origin HTTP header for security reasons. In detail, they place the domain you are running your request from in that header.

To comply with new CORS rules, web servers should apply domain protection approaches. This will block requests from unwanted domains while allowing others. Thus, if your target server doesn't allow your domain, you'll get the CORS error seen above. That's why you won't be able to use to scrape content client-side from other websites using JavaScript.

So, the next question arises accordingly.

What is the best way to scrape a website?

The answer is easy. As you just learned, client-side scraping is too limited for security reasons. At the time of writing, the most effective way to scrape a website is through server-side scraping.

By performing server-side scraping, you'll be able to avoid the CORS problems seen earlier. That's because your server will execute HTTP requests, not your browser. Thus, there will be no CORS problems.

You may think JavaScript is a frontend technology, and you can't use it on your server. That's not true. You can actually build a JS web scraper with Node.js.

Is this true also for jQuery?

Can you use jQuery with Node.js?

The short answer is yes. You can use jQuery in Node.js. All you have to do is install the jquery npm library with the following command:

npm install jquery

You can now use it to build a jQuery web spider. Let's learn how!

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

How can you use jQuery to scrape data from a website?

Here, you'll learn how to perform web scraping using jQuery on https://scrapeme.live/shop/.

That's what the target webpage looks like:

A general view of scrapeme.live
A general view of scrapeme.live/shop

You can find the code of the demo jQuery web scraper in this GitHub repo. Clone it and install the project's dependencies with the following commands:

git clone https://github.com/Tonel/web-scraper-jquery 
cd web-scraper-jquery 
npm install

Then, launch the jQuery web spider with:

npm run start

Follow this tutorial and learn how to build a jQuery web scraper app with Node.js!

Prerequisites

Here's the list of what you need for the simple jQuery scraper to work:

If you don't have Node.js installed on your system, you can download it by following the link above.

jQuery requires a window with a document to work. Since no such a window exists natively in Node, you can mock one with jsdom. If you don't know the project, jsdom is a JS implementation of many web standards for Node.js. Specifically, its goal is to emulate a web browser for testing and scraping purposes.

You can then use jQuery in Node.js to perform scraping as follows:

const { JSDOM } = require( "jsdom" ); 
// initialize JSOM in the "https://target-domain.com/" page 
// to avoid CORS problems 
const { window } = new JSDOM("", { 
	url: "https://target-domain.com/", 
}); 
const $ = require( "jquery" )( window ); 
 
// scraping https://target-domain.com/ web pages

Note that you must specify the url option while initializing JSDOM to avoid CORS issues. Learn more about it here.

Retrieve the HTML document with the jQuery get() function

As mentioned earlier, you can download an HTML document with the jQuery get() function.

 
const { JSDOM } = require( "jsdom" ); 
// initialize JSOM in the "https://scrapeme.live/" page 
// to avoid CORS problems 
const { window } = new JSDOM("", { 
	url: "https://scrapeme.live/", 
}); 
const $ = require( "jquery" )( window ); 
 
$.get("https://scrapeme.live/shop/", function(html) { 
	console.log(html); 
});

This will print:

 
<!doctype html> 
<html lang="en-GB"> 
<head> 
<meta charset="UTF-8"> 
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=2.0"> 
<link rel="profile" href="http://gmpg.org/xfn/11"> 
<link rel="pingback" href="https://scrapeme.live/xmlrpc.php"> 
 
<title>Products – ScrapeMe</title> 
<!-- omitted for brevity ... -->

That's exactly what the https://scrapeme.live/shop/ HTML content looks like!

Extract the desired HTML element in jQuery with find()

Now, let's retrieve the info associated with every product. Right-click on a product HTML element. Then, open the DevTools window by selecting the "Inspect" option. That's what you should get:

Product on DevTools
The DevTools window after selecting a product HTML element

As you can see, li.product is the CSS selector that identifies the product elements. You can retrieve the list of these HTML elements with find() as follows:

$.get("https://scrapeme.live/shop/", function(html) { 
	const productList = $(html).find("li.product"); 
});

In detail, the find() jQuery function returns the set of DOM elements that match the CSS selector, jQuery object, or HTML element passed as a parameter.

$.get("https://scrapeme.live/shop/", function(html) { 
	// retrieve the list of all HTML products 
	const productHTMLElements = $(html).find("li.product"); 
});

Note that each product HTML element contains a URL, a name, an image, and a price. You can find this info in an a, img, h2, span HTML element, respectively. You can extract this data with the jQuery find() as below:

$.get("https://scrapeme.live/shop/", function(html) { 
	// retrieve the list of all HTML products 
	const productHTMLElements = $(html).find("li.product"); 
 
	const products = []; 
	 
	// populate products with the scraped data 
	productHTMLElements.each((i, productHTML) => { 
		// scrape data from the product HTML element 
		const product = { 
			name: $(productHTML).find("h2").text(), 
			url: $(productHTML).find("a").attr("href"), 
			image: $(productHTML).find("img").attr("src"), 
			price: $(productHTML).find("span").first().text(), 
		}; 
 
		products.push(product); 
	}); 
 
	console.log(JSON.stringify(products)); 
 
	// store the product data on a db ... 
});

As you can see, using jQuery attr() and text() functions, you can get all the data you need. This requires only a few lines of code. In detail, attr() returns the data contained in the HTML attribute passed as a parameter. In contrast, text() returns all the text contained in the selected HTML element.

When run, this would print:

[ 
	{ 
		"name": "Bulbasaur", 
		"url": "https://scrapeme.live/shop/Bulbasaur/", 
		"image": "https://scrapeme.live/wp-content/uploads/2018/08/001-350x350.png", 
		"price": "£63.00" 
	}, 
	{ 
		"name": "Ivysaur", 
		"url": "https://scrapeme.live/shop/Ivysaur/", 
		"image": "https://scrapeme.live/wp-content/uploads/2018/08/002-350x350.png", 
		"price": "£87.00" 
	}, 
 
	// ... 
 
	{ 
		"name": "Beedrill", 
		"url": "https://scrapeme.live/shop/Beedrill/", 
		"image": "https://scrapeme.live/wp-content/uploads/2018/08/015-350x350.png", 
		"price": "£168.00" 
	}, 
	{ 
		"name": "Pidgey", 
		"url": "https://scrapeme.live/shop/Pidgey/", 
		"image": "https://scrapeme.live/wp-content/uploads/2018/08/016-350x350.png", 
		"price": "£159.00" 
	} 
]

At this point, you should save the scraped data to a database. Also, you can extend your crawling logic to go through all paginated pages, as shown in this web crawling tutorial in JavaScript.

Et voilà! You just learned how to scrape https://scrapeme.live/shop/ to retrieve all product info.

Get the HTML element content with the jQuery html() function

When scraping, consider storing the original HTML of each DOM element of interest. This makes running scraping processes on the same elements easier in the future. You can achieve this with the jQuery html() function as below:

const product = { 
	name: $(productHTML).find("h2").text(), 
	url: $(productHTML).find("a").attr("href"), 
	image: $(productHTML).find("img").attr("src"), 
	price: $(productHTML).find("span").first().text(), 
	// store the original HTML content 
	html: $(productHTML).html() 
};

For Blastoise, this would contain:

{ 
	"name": "Blastoise", 
	"url": "https://scrapeme.live/shop/Blastoise/", 
	"image": "https://scrapeme.live/wp-content/uploads/2018/08/009-350x350.png", 
	"price": "£76.00", 
	"html": "\n\t<a href=\"https://scrapeme.live/shop/Blastoise/\" class=\"woocommerce-LoopProduct-link woocommerce-loop-product__link\"><img width=\"324\" height=\"324\" src=\"https://scrapeme.live/wp-content/uploads/2018/08/009-350x350.png\" class=\"attachment-woocommerce_thumbnail size-woocommerce_thumbnail wp-post-image\" alt=\"\" srcset=\"https://scrapeme.live/wp-content/uploads/2018/08/009-350x350.png 350w, https://scrapeme.live/wp-content/uploads/2018/08/009-150x150.png 150w, https://scrapeme.live/wp-content/uploads/2018/08/009-300x300.png 300w, https://scrapeme.live/wp-content/uploads/2018/08/009-100x100.png 100w, https://scrapeme.live/wp-content/uploads/2018/08/009-250x250.png 250w, https://scrapeme.live/wp-content/uploads/2018/08/009.png 475w\" sizes=\"(max-width: 324px) 100vw, 324px\"><h2> class=\"woocommerce-loop-product__title\">Blastoise</h2>\n\t<span class=\"price\"><span class=\"woocommerce-Price-amount amount\"><span>76.00 class=\"woocommerce-Price-currencySymbol\">£</span>76.00</span></span>\n</a><a> href=\"/shop/?add-to-cart=736\" data-quantity=\"1\" class=\"button product_type_simple add_to_cart_button ajax_add_to_cart\" data-product_id=\"736\" data-product_sku=\"5212\" aria-label=\"Add “Blastoise” to your basket\" rel=\"nofollow\">Add to basket</a>" 
}

Note that the html field stores the original HTML content. If you wanted to retrieve more data from it, you could now do it without having to crawl the entire website again.

Use regex in jQuery

One of the best ways to retrieve the data of interest from an HTML document is through jQuery regex. A regex, or regular expression, is a sequence of characters that defines a text search pattern.

For example, let's assume you want to retrieve the price of each product element. If the <span> element containing the price wouldn't have a unique CSS class, extracting this info might become challenging. You can achieve this by using regex in jQuery as below:

const prices = new Set(); 
// use a regex to identify price span HTML elements 
$(html).find("span").each((i, spanHTMLElement) => { 
	// keep only HTML elements whose text is a price 
	if (/^£\d+.\d{2}$/.test($(spanHTMLElement).text())) { 
		// add the scraped price to the prices set 
		prices.add($(spanHTMLElement).text()); 
	} 
}); 
 
// use the price data to achieve something ...

At the end of the loop, prices willl contain the following results:

["£0.00","£63.00","£87.00","£105.00","£48.00","£165.00","£156.00","£130.00","£123.00","£76.00","£73.00","£148.00","£162.00","£25.00","£168.00","£159.00"]

These are exactly the prices contained on the webpage.

Congrats! You just learned how to master all the building blocks to build a jQuery web scraper.

What are the benefits of jQuery for web scraping?

Considering how popular jQuery is, chances are that you are familiar with it. In detail, you're likely to know how to use jQuery to traverse the DOM. That's the main benefit of using jQuery for web scraping.

After all, scraping is about selecting HTML elements and extracting data from them. You've done most of the work if you already use jQuery to retrieve HTML elements.

Also, jQuery is one of the most adopted libraries for DOM manipulation. This is because it has many features to extract and change data in the DOM effortlessly. This makes it a perfect tool for scraping.

jQuery is so powerful that it doesn't need other dependencies to perform web scraping! In detail, jQuery provides everything you need to build a complete scraping application. However, you might prefer to use an HTTP client such as Axios with it. Learn more about web scraping with Axios.

Conclusion

Here, you learned everything you should know about web scraping in jQuery, from basic to advanced techniques. As shown above, building a web scraper in jQuery isn't that difficult, but doing it client-side has limitations.

All you need to avoid the client-side limitations is to use jQuery with Node.js, and here you saw how to do that.

Specifically, in this article you learned:
  • Why client-side scraping may not be possible
  • How to use jQuery with Node.js
  • How to perform web scraping with find() and by using regex in jQuery
  • Why jQuery is an excellent tool for web scraping

If you liked this, take a look at the JavaScript Web Scraping guide.

Thanks for reading! We hope that you found this guide helpful. You can sign up for free, try ZenRows, and let us know any questions, comments, or suggestions.

Did you find the content helpful? Spread the word and share it on Twitter, LinkedIn, or Facebook.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Want to keep learning?

We will be sharing all the insights we have learned through the years in the following blog posts. If you don't want to miss a piece and keep learning, we'd be thrilled to have us in our newsletter.

No spam guaranteed. You can unsubscribe at any time.