Web Scraping With jQuery: A Complete Tutorial
In this web scraping jQuery tutorial, you'll learn how to build a jQuery web crawler. jQuery is one of the most popular JavaScript libraries. Specifically, jQuery enables HTML document traversal and manipulation.
This makes jQuery the perfect library to crawl web pages to perform web scraping. Here, you'll see if you can use jQuery for client-side scraping. Also, you'll learn how to use jQuery for server-side scraping.
Let's now create a jQuery scraper and achieve your data retrieval goals.
What is client-side scraping?
Client-side scraping involves performing web scraping techniques directly from the browser. In other words, the frontend executes client-side web scraping. Typically, through JavaScript. So, client-side scraping's about retrieving information from the Web in your browser.
You can achieve client-side scraping by calling a public API or by parsing the HTML content of a webpage. Keep in mind that most websites don't offer public APIs. So, you generally have to download HTML documents and parse them to extract data.
Let's now learn how to perform client-side scraping using jQuery!
How do I scrape a web page with jQuery?
First, you need to download the HTML content of your target webpage. Let's learn how to achieve this in jQuery. Specifically, let's fetch the https://google.com/
webpage and get its HTML content.
You can achieve this with the jQuery get()
method. get()
performs a GET HTTP request and exposes what the server returns in a callback. Use get()
as follows:
$.get("https://google.com/", function(html) {
console.log(html);
});
Yet, this snippet won't work! That's because you'll get the No 'Access-Control-Allow-Origin' header is present on the requested resource
CORS (Cross-Origin Resource Sharing) error.
This happens because your browser's performing the HTTP request. Modern browsers automatically use the Origin
HTTP header for security reasons. In detail, they place the domain you are running your request from in that header.
To comply with new CORS rules, web servers should apply domain protection approaches. This will block requests from unwanted domains while allowing others. Thus, if your target server doesn't allow your domain, you'll get the CORS error seen above. That's why you won't be able to use to scrape content client-side from other websites using JavaScript.
So, the next question arises accordingly.
What is the best way to scrape a website?
The answer is easy. As you just learned, client-side scraping is too limited for security reasons. At the time of writing, the most effective way to scrape a website is through server-side scraping.
By performing server-side scraping, you'll be able to avoid the CORS problems seen earlier. That's because your server will execute HTTP requests, not your browser. Thus, there will be no CORS problems.
You may think JavaScript is a frontend technology, and you can't use it on your server. That's not true. You can actually build a JS web scraper with Node.js.
Is this true also for jQuery?
Can you use jQuery with Node.js?
The short answer is yes. You can use jQuery in Node.js. All you have to do is install the jquery npm library
with the following command:
npm install jquery
You can now use it to build a jQuery web spider. Let's learn how!
How can you use jQuery to scrape data from a website?
Here, you'll learn how to perform web scraping using jQuery on https://scrapeme.live/shop/
.
That's what the target webpage looks like:

scrapeme.live/shop
You can find the code of the demo jQuery web scraper in this GitHub repo. Clone it and install the project's dependencies with the following commands:
git clone https://github.com/Tonel/web-scraper-jquery
cd web-scraper-jquery
npm install
Then, launch the jQuery web spider with:
npm run start
Follow this tutorial and learn how to build a jQuery web scraper app with Node.js!
Prerequisites
- Node.js and npm >= 8.0+
jquery
>= 3.6.1jsdom
>= 20.0.0
If you don't have Node.js installed on your system, you can download it by following the link above.
jQuery requires a window
with a document to work. Since no such a window
exists natively in Node, you can mock one with jsdom
. If you don't know the project, jsdom
is a JS implementation of many web standards for Node.js. Specifically, its goal is to emulate a web browser for testing and scraping purposes.
You can then use jQuery in Node.js to perform scraping as follows:
const { JSDOM } = require( "jsdom" );
// initialize JSOM in the "https://target-domain.com/" page
// to avoid CORS problems
const { window } = new JSDOM("", {
url: "https://target-domain.com/",
});
const $ = require( "jquery" )( window );
// scraping https://target-domain.com/ web pages
Note that you must specify the url
option while initializing JSDOM
to avoid CORS issues. Learn more about it here.
Retrieve the HTML document with the jQuery get()
function
As mentioned earlier, you can download an HTML document with the jQuery get()
function.
const { JSDOM } = require( "jsdom" );
// initialize JSOM in the "https://scrapeme.live/" page
// to avoid CORS problems
const { window } = new JSDOM("", {
url: "https://scrapeme.live/",
});
const $ = require( "jquery" )( window );
$.get("https://scrapeme.live/shop/", function(html) {
console.log(html);
});
This will print:
<!doctype html>
<html lang="en-GB">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=2.0">
<link rel="profile" href="http://gmpg.org/xfn/11">
<link rel="pingback" href="https://scrapeme.live/xmlrpc.php">
<title>Products – ScrapeMe</title>
<!-- omitted for brevity ... -->
That's exactly what the https://scrapeme.live/shop/
HTML content looks like!
Extract the desired HTML element in jQuery with find()
Now, let's retrieve the info associated with every product. Right-click on a product HTML element. Then, open the DevTools window by selecting the "Inspect" option. That's what you should get:

As you can see, li.product
is the CSS selector that identifies the product elements. You can retrieve the list of these HTML elements with find()
as follows:
$.get("https://scrapeme.live/shop/", function(html) {
const productList = $(html).find("li.product");
});
In detail, the find()
jQuery function returns the set of DOM elements that match the CSS selector, jQuery object, or HTML element passed as a parameter.
$.get("https://scrapeme.live/shop/", function(html) {
// retrieve the list of all HTML products
const productHTMLElements = $(html).find("li.product");
});
Note that each product HTML element contains a URL, a name, an image, and a price. You can find this info in an a
, img
, h2
, span
HTML element, respectively. You can extract this data with the jQuery find()
as below:
$.get("https://scrapeme.live/shop/", function(html) {
// retrieve the list of all HTML products
const productHTMLElements = $(html).find("li.product");
const products = [];
// populate products with the scraped data
productHTMLElements.each((i, productHTML) => {
// scrape data from the product HTML element
const product = {
name: $(productHTML).find("h2").text(),
url: $(productHTML).find("a").attr("href"),
image: $(productHTML).find("img").attr("src"),
price: $(productHTML).find("span").first().text(),
};
products.push(product);
});
console.log(JSON.stringify(products));
// store the product data on a db ...
});
As you can see, using jQuery attr()
and text()
functions, you can get all the data you need. This requires only a few lines of code. In detail, attr()
returns the data contained in the HTML attribute passed as a parameter. In contrast, text()
returns all the text contained in the selected HTML element.
When run, this would print:
[
{
"name": "Bulbasaur",
"url": "https://scrapeme.live/shop/Bulbasaur/",
"image": "https://scrapeme.live/wp-content/uploads/2018/08/001-350x350.png",
"price": "£63.00"
},
{
"name": "Ivysaur",
"url": "https://scrapeme.live/shop/Ivysaur/",
"image": "https://scrapeme.live/wp-content/uploads/2018/08/002-350x350.png",
"price": "£87.00"
},
// ...
{
"name": "Beedrill",
"url": "https://scrapeme.live/shop/Beedrill/",
"image": "https://scrapeme.live/wp-content/uploads/2018/08/015-350x350.png",
"price": "£168.00"
},
{
"name": "Pidgey",
"url": "https://scrapeme.live/shop/Pidgey/",
"image": "https://scrapeme.live/wp-content/uploads/2018/08/016-350x350.png",
"price": "£159.00"
}
]
At this point, you should save the scraped data to a database. Also, you can extend your crawling logic to go through all paginated pages, as shown in this web crawling tutorial in JavaScript.
Et voilà! You just learned how to scrape https://scrapeme.live/shop/
to retrieve all product info.
Get the HTML element content with the jQuery html()
function
When scraping, consider storing the original HTML of each DOM element of interest. This makes running scraping processes on the same elements easier in the future. You can achieve this with the jQuery html()
function as below:
const product = {
name: $(productHTML).find("h2").text(),
url: $(productHTML).find("a").attr("href"),
image: $(productHTML).find("img").attr("src"),
price: $(productHTML).find("span").first().text(),
// store the original HTML content
html: $(productHTML).html()
};
For Blastoise, this would contain:
{
"name": "Blastoise",
"url": "https://scrapeme.live/shop/Blastoise/",
"image": "https://scrapeme.live/wp-content/uploads/2018/08/009-350x350.png",
"price": "£76.00",
"html": "\n\t<a href=\"https://scrapeme.live/shop/Blastoise/\" class=\"woocommerce-LoopProduct-link woocommerce-loop-product__link\"><img width=\"324\" height=\"324\" src=\"https://scrapeme.live/wp-content/uploads/2018/08/009-350x350.png\" class=\"attachment-woocommerce_thumbnail size-woocommerce_thumbnail wp-post-image\" alt=\"\" srcset=\"https://scrapeme.live/wp-content/uploads/2018/08/009-350x350.png 350w, https://scrapeme.live/wp-content/uploads/2018/08/009-150x150.png 150w, https://scrapeme.live/wp-content/uploads/2018/08/009-300x300.png 300w, https://scrapeme.live/wp-content/uploads/2018/08/009-100x100.png 100w, https://scrapeme.live/wp-content/uploads/2018/08/009-250x250.png 250w, https://scrapeme.live/wp-content/uploads/2018/08/009.png 475w\" sizes=\"(max-width: 324px) 100vw, 324px\"><h2> class=\"woocommerce-loop-product__title\">Blastoise</h2>\n\t<span class=\"price\"><span class=\"woocommerce-Price-amount amount\"><span>76.00 class=\"woocommerce-Price-currencySymbol\">£</span>76.00</span></span>\n</a><a> href=\"/shop/?add-to-cart=736\" data-quantity=\"1\" class=\"button product_type_simple add_to_cart_button ajax_add_to_cart\" data-product_id=\"736\" data-product_sku=\"5212\" aria-label=\"Add “Blastoise” to your basket\" rel=\"nofollow\">Add to basket</a>"
}
Note that the html
field stores the original HTML content. If you wanted to retrieve more data from it, you could now do it without having to crawl the entire website again.
Use regex in jQuery
One of the best ways to retrieve the data of interest from an HTML document is through jQuery regex. A regex, or regular expression, is a sequence of characters that defines a text search pattern.
For example, let's assume you want to retrieve the price of each product element. If the <span>
element containing the price wouldn't have a unique CSS class, extracting this info might become challenging. You can achieve this by using regex in jQuery as below:
const prices = new Set();
// use a regex to identify price span HTML elements
$(html).find("span").each((i, spanHTMLElement) => {
// keep only HTML elements whose text is a price
if (/^£\d+.\d{2}$/.test($(spanHTMLElement).text())) {
// add the scraped price to the prices set
prices.add($(spanHTMLElement).text());
}
});
// use the price data to achieve something ...
At the end of the loop, prices
willl contain the following results:
["£0.00","£63.00","£87.00","£105.00","£48.00","£165.00","£156.00","£130.00","£123.00","£76.00","£73.00","£148.00","£162.00","£25.00","£168.00","£159.00"]
These are exactly the prices contained on the webpage.
Congrats! You just learned how to master all the building blocks to build a jQuery web scraper.
What are the benefits of jQuery for web scraping?
Considering how popular jQuery is, chances are that you are familiar with it. In detail, you're likely to know how to use jQuery to traverse the DOM. That's the main benefit of using jQuery for web scraping.
After all, scraping is about selecting HTML elements and extracting data from them. You've done most of the work if you already use jQuery to retrieve HTML elements.
Also, jQuery is one of the most adopted libraries for DOM manipulation. This is because it has many features to extract and change data in the DOM effortlessly. This makes it a perfect tool for scraping.
jQuery is so powerful that it doesn't need other dependencies to perform web scraping! In detail, jQuery provides everything you need to build a complete scraping application. However, you might prefer to use an HTTP client such as Axios with it. Learn more about web scraping with Axios.
Conclusion
Here, you learned everything you should know about web scraping in jQuery, from basic to advanced techniques. As shown above, building a web scraper in jQuery isn't that difficult, but doing it client-side has limitations.
All you need to avoid the client-side limitations is to use jQuery with Node.js, and here you saw how to do that.
- Why client-side scraping may not be possible
- How to use jQuery with Node.js
- How to perform web scraping with
find()
and by using regex in jQuery - Why jQuery is an excellent tool for web scraping
If you liked this, take a look at the JavaScript Web Scraping guide.
Thanks for reading! We hope that you found this guide helpful. You can sign up for free, try ZenRows, and let us know any questions, comments, or suggestions.
Did you find the content helpful? Spread the word and share it on Twitter, LinkedIn, or Facebook.