Considering Dart for your next web scraping project? Good idea. The language's CLI scripting capabilities and intuitive syntax make it an excellent tool for the job.
This step-by-step tutorial will guide you through building a complete web scraping script in Dart, using http
, html
, and puppeteer
.
Let's dive in!
Is Dart Good for Web Scraping?
Yes, you can scrape data from web pages with Dart!
When it comes to web scraping, most developers consider Python or JavaScript to be a no-brainer for their huge communities. While Dart may not be the best language for web scraping, it's still a fantastic choice for at least three reasons:
- It's a rising language developed and endorsed by Google.
- It has an intuitive, concise, and easy-to-understand syntax, which is excellent for scripting.
- It features a complete standard API and several high-quality external libraries for web development.
Thanks to its ease of use and rich ecosystem, Dart is more than just a viable option for web scraping!
Prerequisites
Prepare your Dart environment for web scraping with the http
and html
packages.
Install Dart
To use Dart locally, you need to install the Dart SDK. The recommended installation procedure on the official site is using a package manager.
On Windows, install Dart through Chocolatey with this command on an elevated terminal:
choco install dart-sdk
The procedure is slightly longer on macOS and Linux. For more details, follow the official installation guide.
Run the command below to make sure Dart is working:
dart --version
This should produce the following output:
Dart SDK version: 3.3.3 (stable) (Tue Mar 26 14:21:33 2024 +0000) on "windows_x64"
Awesome! Dart is ready to use.
Create Your Dart Project
Launch the dart create
command to initialize a Dart CLI project called web_scraper
:
dart create web_scraper
The web_scraper
will now contain your Dart web scraping project.
Load your project in a Dart IDE. Visual Studio Code with the Dart extension will be a great choice.
Take a look at the web_scraper.dart
file in the /bin
folder:
import 'package:web_scraper/web_scraper.dart' as web_scraper;
void main(List<String> arguments) {
print('Hello world: ${web_scraper.calculate()}!');
}
This is the main file in your Dart project. As you can see, it imports a package from the web_scraper.dart
file in the /lib
folder. Open it, and you’ll see:
int calculate() {
return 6 * 7;
}
In the Dart project:
-
/bin
is the folder for the public entry points to compile to executable binaries. -
/lib
is the folder that contains all the rest of the code.
So, the web_scraper.dart
file in the /lib
folder will contain the scraping logic. Then, the web_scraper.dart
file in the /bin
folder will import and run it.
Run the Dart application using this command:
dart run
The result will be some logs and then the desired result:
Hello world: 42!
Well done! Follow the next section and turn this project into an application for web scraping in Dart.
How to Do Web Scraping With Dart
In this guided section, you’ll build a Dart scraper to extract all product data from a site. The scraping target will be ScrapeMe, an e-commerce platform with a paginated list of Pokémon products:
Get ready to perform web scraping in Dart!
Step 1: Scrape by Requesting Your Target Page
The easiest way to connect to a web page and retrieve its HTML source code is to use an HTTP client. http
is Dart's most popular HTTP client library. Add it to your project's dependencies with the following command:
dart pub add http
Open the pubspec.yml
file in the root folder of your project. Under the dependencies
section, you'll see:
dependencies:
http: ^1.2.1
Import http
in the web_scraper.dart
file in /lib
and initialize an async
function stating where to use it:
import 'package:http/http.dart' as http;
Future scrape() async {
// ...
}
Use the Uri.parse()
to create a Uri
object for your target page. Pass it to the http.get()
method to make a GET request to the specified page. Then, retrieve the HTML document from the server response and print it:
import 'package:http/http.dart' as http;
Future scrape() async {
// create a Uri object to the target page
final pageUri = Uri.parse('https://scrapeme.live/shop/');
// perform a GET request to the target page
var response = await http.get(pageUri);
// retrieve the HTML from the server
// response and print it
final html = response.body;
print(html);
}
Turn main()
in the /bin/web_scraper.dart
file into an async
function and call scrape()
:
import 'package:web_scraper/web_scraper.dart' as web_scraper;
void main(List<String> arguments) async {
await web_scraper.scrape();
}
Run your Dart web scraping script, and it’ll print:
<!doctype html>
<html lang="en-GB">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=2.0">
<link rel="profile" href="http://gmpg.org/xfn/11">
<link rel="pingback" href="https://scrapeme.live/xmlrpc.php">
<!-- Omitted for brevity... -->
Fantastic! Your script can retrieve the target page. Now it's time to extract some data.
Step 2: Extract Data From One Element
To scrape data from a webpage, you must parse its HTML content with an HTML parser. html
is a powerful Dart HTML parser with a rich API for DOM traversal and manipulation. Install it with this command:
dart pub add html
Import it by adding the following line on top of web_scraper.dart
in /lib
:
import 'package:html/parser.dart' as html_parser;
Next, feed the HTML content of the page to the parse()
function to get Document
` object. This contains all the methods required to select the nodes on the page and perform web scraping in Dart:
final document = html_parser.parse(html);
You now need to define an effective HTML node selection strategy. The idea is to select the HTML elements of interest from the DOM and retrieve data from them. To tackle this task, you have to inspect the web page's HTML source code.
Visit the target page of your script in the browser and inspect a product HTML node with the DevTools:
Expand the HTML code. You can select the product node with the CSS selector below:
li.product
li
is the tag of the product HTML element, while product
is its class.
Given a product node, you can find the following:
- The URL in an
<a>
node. - The image URL in an
<img>
node. - The name in a
<h2>
node. - The price in a
<span>
node.
You have all the information you need to implement the web scraping logic. Use the querySelector()
method to apply a CSS selector on the page. Then, extract from the selected nodes:
// select the first product HTML element on the page
final productHTMLElement = document.querySelector('li.product');
// scraping logic
final url = productHTMLElement?.querySelector('a')?.attributes['href'];
final image = productHTMLElement?.querySelector('img')?.attributes['src'];
final name = productHTMLElement?.querySelector('h2')?.text;
final price = productHTMLElement?.querySelector('span')?.text;
The text
attribute contains the text nested in the element. attributes
returns a map with the name-value pairs for the HTML attributes in the node.
Log the scraped data with some print()
instructions:
print(url);
print(image);
print(name);
print(price);
Your web_scraper.dart
file in /lib
will now contain:
import 'package:http/http.dart' as http;
import 'package:html/parser.dart' as html_parser;
Future scrape() async {
// create a Uri object to the target page
final pageUri = Uri.parse('https://scrapeme.live/shop/');
// perform a GET request to the target page
var response = await http.get(pageUri);
// retrieve the HTML from the server
final html = response.body;
// parse the HTML document
final document = html_parser.parse(html);
// select the first product HTML element on the page
final productHTMLElement = document.querySelector('li.product');
// scraping logic
final url = productHTMLElement?.querySelector('a')?.attributes['href'];
final image = productHTMLElement?.querySelector('img')?.attributes['src'];
final name = productHTMLElement?.querySelector('h2')?.text;
final price = productHTMLElement?.querySelector('span')?.text;
// print the scraped data
print(url);
print(image);
print(name);
print(price);
}
Launch it, and it'll produce this output:
https://scrapeme.live/shop/Bulbasaur/
https://scrapeme.live/wp-content/uploads/2018/08/001-350x350.png
Bulbasaur
£63.00
Wonderful! The scraping logic works like a charm. Now, let's learn how to scrape all the products on the page.
Step 3: Extract Data From All Elements
Before extending the script, you need a data structure representing the scraped data.
Define a new class called Product
on top of the web_scraper.dart
file in /lib
:
class Product {
String? url;
String? image;
String? name;
String? price;
Product(this.url, this.image, this.name, this.price);
}
In the scrape()
function, initialize an empty list of Product
objects. This is where you'll store the objects populated with the data collected from the page:
final List<Product> products = [];
Now, use querySelectorAll()
instead of querySelector()
to select all product nodes. Iterate over them, apply the scraping logic, instantiate a Product
object, and add it to the list:
final productHTMLElements = document.querySelectorAll('li.product');
// iterate over the product nodes and apply
// the scraping logic
for (final productHTMLElement in productHTMLElements) {
// scraping logic
final url = productHTMLElement.querySelector('a')?.attributes['href'];
final image = productHTMLElement.querySelector('img')?.attributes['src'];
final name = productHTMLElement.querySelector('h2')?.text;
final price = productHTMLElement.querySelector('span')?.text;
// instantiate a Product object
// and add it to the list
final product = Product(url, image, name, price);
products.add(product);
}
Print the scraped data to make sure the web scraping Dart logic works as intended:
for (final product in products) {
print(product.url);
print(product.image);
print(product.name);
print(product.price);
print('');
}
```
Your /lib/web_scraper.dart
script will now contain:
import 'package:http/http.dart' as http;
import 'package:html/parser.dart' as html_parser;
// representation of the product object to
// scrape from the page
class Product {
String? url;
String? image;
String? name;
String? price;
Product(this.url, this.image, this.name, this.price);
}
Future scrape() async {
// create a Uri object to the target page
final pageUri = Uri.parse('https://scrapeme.live/shop/');
// perform a GET request to the target page
var response = await http.get(pageUri);
// retrieve the HTML from the server
final html = response.body;
// parse the HTML document
final document = html_parser.parse(html);
// where to store the scraped data
final List<Product> products = [];
// select the product HTML elements on the page
final productHTMLElements = document.querySelectorAll('li.product');
// iterate over the product nodes and apply
// the scraping logic
for (final productHTMLElement in productHTMLElements) {
// scraping logic
final url = productHTMLElement.querySelector('a')?.attributes['href'];
final image = productHTMLElement.querySelector('img')?.attributes['src'];
final name = productHTMLElement.querySelector('h2')?.text;
final price = productHTMLElement.querySelector('span')?.text;
// instantiate a Product object
// and add it to the list
final product = Product(url, image, name, price);
products.add(product);
}
// print the scraped data
for (final product in products) {
print(product.url);
print(product.image);
print(product.name);
print(product.price);
print('');
}
}
Run it, and it'll return:
https://scrapeme.live/shop/Bulbasaur/
https://scrapeme.live/wp-content/uploads/2018/08/001-350x350.png
Bulbasaur
£63.00
// omitted for brevity...
https://scrapeme.live/shop/Pidgey/
https://scrapeme.live/wp-content/uploads/2018/08/016-350x350.png
Pidgey
£159.00
There you go! The scraped objects match the products on the page and contain the desired data.
Step 4: Export Your Data to a CSV File
The most straightforward way to convert the collected data to CSV format is using csv
. This package provides a comprehensive API to convert a list of strings to a CSV string and vice versa.
Install csv
in your Dart project:
dart pub add csv
Then, import it in the /lib/web_scraper.dart
file. You'll also need to import the Dart io
library:
import 'package:csv/csv.dart' as csv;
import 'dart:io';
Transform each Product
object in products
to a list of strings. Pass it to the convert()
method from ListToCsvConverter
to get a CSV string. Create a products.csv
file and populate it with this data with writeAsStringSync()
:
// convert the scraped products to a
// list of list of strings
final List<List<String?>> productStrings = products
.map((product) =>
[product.url, product.image, product.name, product.price])
.toList();
// append the header row
productStrings.insert(0, ['url', 'image', 'name', 'price']);
// convert to CSV format
final csvContent = const csv.ListToCsvConverter().convert(productStrings);
// export the CSV string to a file
final file = File('products.csv');
file.writeAsStringSync(csvContent);
Put it all together, and you'll get:
import 'package:http/http.dart' as http;
import 'package:html/parser.dart' as html_parser;
import 'package:csv/csv.dart' as csv;
import 'dart:io';
// representation of the product object to
// scrape from the page
class Product {
String? url;
String? image;
String? name;
String? price;
Product(this.url, this.image, this.name, this.price);
}
Future scrape() async {
// create a Uri object to the target page
final pageUri = Uri.parse('https://scrapeme.live/shop/');
// perform a GET request to the target page
var response = await http.get(pageUri);
// retrieve the HTML from the server
final html = response.body;
// parse the HTML document
final document = html_parser.parse(html);
// where to store the scraped data
final List<Product> products = [];
// select the product HTML elements on the page
final productHTMLElements = document.querySelectorAll('li.product');
// iterate over the product nodes and apply
// the scraping logic
for (final productHTMLElement in productHTMLElements) {
// scraping logic
final url = productHTMLElement.querySelector('a')?.attributes['href'];
final image = productHTMLElement.querySelector('img')?.attributes['src'];
final name = productHTMLElement.querySelector('h2')?.text;
final price = productHTMLElement.querySelector('span')?.text;
// instantiate a Product object
// and add it to the list
final product = Product(url, image, name, price);
products.add(product);
}
// convert the scraped products to a
// list of list of strings
final List<List<String?>> productStrings = products
.map((product) =>
[product.url, product.image, product.name, product.price])
.toList();
// append the header row
productStrings.insert(0, ['url', 'image', 'name', 'price']);
// convert to CSV format
final csvContent = const csv.ListToCsvConverter().convert(productStrings);
// export the CSV string to a file
final file = File('products.csv');
file.writeAsStringSync(csvContent);
}
Launch the Dart web scraping script:
dart run
Wait for the script to complete, and a products.csv
file will appear in the project's folder. Open it, and you'll see:
Et voilà! You’ve just performed web scraping in Dart.
Dart for Advanced Web Scraping
Now that you know the basics, you're ready to dive into more advanced web scraping Dart techniques.
How to Scrape Multiple Pages With Dart
The current CSV file contains 16 records corresponding with the products on the target site's home page.
To scrape all products on the site, you need to do web crawling, which means discovering web pages as you scrape data. Learn more in our guide on web crawling vs. web scraping.
The steps to implement web crawling are as follows:
- Visit a webpage.
- Discover new URLs from the pagination HTML links and add them to a queue.
- Repeat the cycle on a new page picked from the queue.
This loop stops when the Dart scraping script has visited all pagination pages on the site. As this is just a demo script, limit the pages to crawl to 5. This way, you can speed up the process and avoid making too many requests to the destination server.
You already know how to carry out step 1. You need to learn how to extract URLs from the pagination links. First, inspect these HTML elements on the page:
Notice that you can select each pagination link with the following CSS selector:
a.page-numbers
Adding those links to a queue without some extra logic isn't a good approach. The reason is that you don't want the script to visit the same page twice. To make the crawling logic more efficient, use these two additional data structures:
-
pagesDiscovered
: AHashSet
storing the URLs discovered by the crawling logic. -
pagesToScrape
: AQueue
containing the URLs of the pages the script will visit soon.
Initialize both with the URL of the first product pagination page:
final firstPageToScrape = 'https://scrapeme.live/shop/page/1/';
final pagesDiscovered = {firstPageToScrape};
final pagesToScrape = Queue<String>();
pagesToScrape.add(firstPageToScrape);
Then, use those data structures in a while
loop to implement the Dart crawling logic:
// counter for the current iteration
var visitedPages = 1;
// max number of pages to visit
final limit = 5;
// until there are no pages to scrape
//or the limit is hit
while (pagesToScrape.isNotEmpty && visitedPages <= limit) {
// get the next page to scrape
final currentPage = pagesToScrape.removeFirst();
// transform the page URL string in a Uri
final pageUri = Uri.parse(currentPage);
// perform a GET request to the target page
var response = await http.get(pageUri);
// retrieve the HTML from the server
final html = response.body;
// parse the HTML document
final document = html_parser.parse(html);
// select the pagination links
final paginationHTMLElements = document.querySelectorAll('a.page-numbers');
// logic to avoid visiting a page twice
if (paginationHTMLElements.isNotEmpty) {
for (final paginationHTMLElement in paginationHTMLElements) {
// get the current pagination URL
final newPaginationLink = paginationHTMLElement.attributes['href'];
if (newPaginationLink != null) {
// if the page discovered is new
if (!pagesDiscovered.contains(newPaginationLink)) {
// if the page discovered needs to be scraped
if (!pagesToScrape.contains(newPaginationLink)) {
pagesToScrape.add(newPaginationLink);
}
pagesDiscovered.add(newPaginationLink);
}
}
}
}
// scraping logic...
// increment the limit counter
visitedPages++;
}
Integrate the above snippet into /lib/web_scraper.dart
and you'll get:
import 'dart:collection';
import 'package:http/http.dart' as http;
import 'package:html/parser.dart' as html_parser;
import 'package:csv/csv.dart' as csv;
import 'dart:io';
// representation of the product object to
// scrape from the page
class Product {
String? url;
String? image;
String? name;
String? price;
Product(this.url, this.image, this.name, this.price);
}
Future scrape() async {
// where to store the scraped data
final List<Product> products = [];
// the URL of the first page
// to scrape data from
final firstPageToScrape = 'https://scrapeme.live/shop/page/1/';
// data structures for web scraping
final pagesDiscovered = {firstPageToScrape};
final pagesToScrape = Queue<String>();
pagesToScrape.add(firstPageToScrape);
// counter for the current iteration
var visitedPages = 1;
// max number of pages to visit
final limit = 5;
// until there are no pages to scrape
//or the limit is hit
while (pagesToScrape.isNotEmpty && visitedPages <= limit) {
// get the next page to scrape
final currentPage = pagesToScrape.removeFirst();
// transform the page URL string in a Uri
final pageUri = Uri.parse(currentPage);
// perform a GET request to the target page
var response = await http.get(pageUri);
// retrieve the HTML from the server
final html = response.body;
// parse the HTML document
final document = html_parser.parse(html);
// select the pagination links
final paginationHTMLElements = document.querySelectorAll('a.page-numbers');
// logic to avoid visiting a page twice
if (paginationHTMLElements.isNotEmpty) {
for (final paginationHTMLElement in paginationHTMLElements) {
// get the current pagination URL
final newPaginationLink = paginationHTMLElement.attributes['href'];
if (newPaginationLink != null) {
// if the page discovered is new
if (!pagesDiscovered.contains(newPaginationLink)) {
// if the page discovered needs to be scraped
if (!pagesToScrape.contains(newPaginationLink)) {
pagesToScrape.add(newPaginationLink);
}
pagesDiscovered.add(newPaginationLink);
}
}
}
}
// select the product HTML elements on the page
final productHTMLElements = document.querySelectorAll('li.product');
// iterate over the product nodes and apply
// the scraping logic
for (final productHTMLElement in productHTMLElements) {
// scraping logic
final url = productHTMLElement.querySelector('a')?.attributes['href'];
final image = productHTMLElement.querySelector('img')?.attributes['src'];
final name = productHTMLElement.querySelector('h2')?.text;
final price = productHTMLElement.querySelector('span')?.text;
// instantiate a Product object
// and add it to the list
final product = Product(url, image, name, price);
products.add(product);
}
// increment the limit counter
visitedPages++;
}
// convert the scraped products to a
// list of list of strings
final List<List<String?>> productStrings = products
.map((product) =>
[product.url, product.image, product.name, product.price])
.toList();
// append the header row
productStrings.insert(0, ['url', 'image', 'name', 'price']);
// convert to CSV format
final csvContent = const csv.ListToCsvConverter().convert(productStrings);
// export the CSV string to a file
final file = File('products.csv');
file.writeAsStringSync(csvContent);
}
Now, run the Dart web scraping script again:
dart run
This time, the scraper will scrape data from 5 different product pagination pages. The new CSV file will contain more than 16 records:
Congrats! You’ve just learned how to perform web crawling and web scraping in Dart!
Avoid Getting Blocked When Scraping With Dart
Everyone knows how valuable data is, even if it's publicly available on a website. No one wants to give it away for free, and that's why anti-bot technologies have become so popular. The goal of these systems is to detect and block automated scripts, such as yours.
These solutions pose the biggest challenge to web scraping with Dart. However, the right solutions will let you scrape without getting blocked.
Two ways of eluding less sophisticated anti-bots are:
- Setting a real User Agent header.
- Using a proxy to hide your IP.
Follow the instructions below to integrate them into your web-scraping Dart script.
Proxy integration in http
isn’t possible. You must use the HttpClient
class from dart:io
and pass it to http
's IOClient
.
Get the User Agent of a real browser and the URL of a proxy server from a site such as Free Proxy List. Then, use it when targeting your target sites protected with anti-bot solutions:
import 'package:http/io_client.dart' as http;
import 'dart:io';
Future scrape() async {
// initialize a Dart IO HTTP client
HttpClient httpClient = HttpClient();
// configure the proxy server
String proxy = '204.12.6.21:4311';
httpClient.findProxy = (uri) {
return 'PROXY $proxy;';
};
// avoid SSL certificate errors
httpClient.badCertificateCallback =
(X509Certificate cert, String host, int port) => true;
// use the Dart IO HTTP client to
// initialize a package:http client
final proxyHttpClient = http.IOClient(httpClient);
// custom User-Agent value
final userAgentString = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36';
// perform the GET request through the proxy
var response = await proxyHttpClient.get(
Uri.parse('https://your-target-site.com'),
headers: {'User-Agent': userAgentString}
);
// retrieve the HTML from the server
// response and print it
final html = response.body;
print(html);
}
By the time you read this tutorial, the chosen proxy server will no longer work. Free proxies are short-lived, unreliable, and data-greedy, and they are only good for learning purposes. Don’t use them in production!
Those two tips are great for bypassing simple bypass anti-bot measures. But what about advanced solutions such as Cloudflare?
Unfortunately, a complete WAF like that can still easily detect your Dart web scraping script as a bot. Verify that by running the above script against this Cloudflare-protected page:
https://www.g2.com/products/notion/reviews
This time, the result will be the following 403 Forbidden
page:
<!DOCTYPE html>
<html class="no-js" lang="en-US">
<head>
<title>Attention Required! | Cloudflare</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta name="robots" content="noindex, nofollow" />
<!-- omitted for brevity -->
Congrats! You’ve just learned how to perform web crawling and web scraping in Dart! Should you give up? Not at all! What you need is a web scraping API, such as ZenRows, which supports User Agent and IP rotation and comes with the best anti-bot toolkit.
Let’s see how you can boost your Dart scraping script with ZenRows. Sign up for free to redeem your first 1,000 credits and reach the Request Builder page:
Use the G2.com page mentioned earlier as the target site:
- Paste the target URL (
https://www.g2.com/products/notion/reviews
) into the "URL to Scrape" input. - Enable the "JS Rendering" mode (User Agent rotation and the AI-powered anti-bot toolkit are always included by default).
- Toggle the "Premium Proxy" check to get rotating IPs.
- Select “cURL” and then the “API” mode to get the ZenRows API URL to call in your script.
Pass the generated URL to the get()
method:
import 'package:http/http.dart' as http;
Future scrape() async {
// create a Uri object to the target page
final pageUri = Uri.parse('https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fnotion%2Freviews&js_render=true&premium_proxy=true');
// perform a GET request to the target page
var response = await http.get(pageUri);
// retrieve the HTML from the server
// response and print it
final html = response.body;
print(html);
}
Run the script. This time, it'll return the HTML associated with the target G2 page as desired:
<!DOCTYPE html>
<head>
<meta charset="utf-8" />
<link href="https://www.g2.com/assets/favicon-fdacc4208a68e8ae57a80bf869d155829f2400fa7dd128b9c9e60f07795c4915.ico" rel="shortcut icon" type="image/x-icon" />
<title>Notion Reviews 2024: Details, Pricing, & Features | G2</title>
<!-- omitted for brevity ... -->
Bye-bye, 403 errors. That’s how easy it is to use ZenRows for web scraping in Dart.
How to Use a Headless Browser With Dart
The Dart html
package is an HTML parser and can only deal with static content web pages. If your target site uses JavaScript to dynamically load data or render data, you need another solution.
In particular, you must use a tool that can render pages in a controllable browser. The most popular and used headless browser library in Dart is puppeteer
, a port of the Puppeteer Node.JS library. For a complete tutorial, read the guide to Puppeteer web scraping.
To showcase Puppeteer's capabilities in Dart better, we need a new target page. Let’s use a page that requires JavaScript execution, such as the Infinite Scrolling demo. It dynamically loads new products as the user scrolls down:
Add puppeteer
to your project's dependencies with this command:
dart pub add puppeteer
Next, use it to scrape data from a dynamic content page:
import 'package:puppeteer/puppeteer.dart';
Future<void> scrape() async {
// open a new page in the controlled browser
var browser = await puppeteer.launch();
var page = await browser.newPage();
// visit the target page
await page.goto('https://scrapingclub.com/exercise/list_infinite_scroll/');
// select all product HTML elements
var productElements = await page.$$('.post');
// iterate over them and extract the desired data
for (var productElement in productElements) {
// select the name and price elements
var nameElement = await productElement.$('h4');
var priceElement = await productElement.$('h5');
// extract their data
var name = (await nameElement.evaluate('e => e.textContent')).toString().trim();
var price = (await priceElement.evaluate('e => e.textContent')).toString().trim();
// print it
print(name);
print(price);
print('');
}
// release the browser resources
await browser.close();
}
Run this script:
dart run
It'll produce:
Short Dress
$24.99
// omitted for brevity...
Fitted Dress
$34.99
Congratulations! You're now a Dart web scraping champion.
Conclusion
This step-by-step guide walked you through the process of web scraping in Dart. You’ve learned both the fundamentals and more complex aspects and techniques.
Dart is a rising language with a rich ecosystem of libraries for extracting data from the Web. The duo http
and html
enables you to do web scraping and crawling in Dart on static pages. Plus, you have access to puppeteer
for dealing with sites using JavaScript.
The main challenge? No matter how good your Dart scraper is, anti-scraping systems can still stop it. Elude them all with ZenRows, a scraping API with the most effective built-in anti-bot bypass capabilities. Try ZenRows with a free trial today!