The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ๐Ÿ˜Ž

7 Best C# Web Scraping Libraries in 2024

October 20, 2023 ยท 14 min read

Depending on your project, you should consider different C# web scraping libraries. You'll find options best suited for price tracking, lead generation, sentiment monitoring, financial data aggregation, etc.

Furthermore, you should keep in mind different metrics when making your pick. So, to make yourย  choice easier, we reviewed and compared the seven best C# web scraping libraries to use.

Also, we'll see examples to help you understand how these frameworks work.

Let's dive in!

What Are the Best C# Web Scraping Libraries?

We tested different libraries for web scraping, and the best ones to use in 2024 are as follows:

  1. ZenRows.
  2. Puppeteer-Sharp.
  3. Selenium.
  4. HTML Agility Pack.
  5. ScrapySharp.
  6. IronWebScraper.
  7. HttpClient.

We compared the C# libraries based on core features that make web scraping smooth, like proxy configurations, dynamic content, documentation, anti-bots bypass, auto parse and infrastructure scalability.ย 

Here is a quick overview.

ZenRows Puppeteer-Sharp Selenium HTML Agility Pack ScrapySharp IronWebScraper HttpClient
Proxy Configuration AUTO MANUAL MANUAL MANUAL MANUAL MANUAL MANUAL
Dynamic Content โœ… โœ… โœ… - - - -
Documentation โœ… โœ… โœ… โœ… โœ… โœ… โœ…
Anti-Bots Bypass โœ… - - - - - -
Auto-Parse HTML โœ… - - - - - -
Infrastructure Scalability AUTO MANUAL MANUAL MANUAL MANUAL MANUAL MANUAL

Let's go ahead and talk about these C# crawler libraries based on the features they come with and how they can extract data from a web page. We'll use the ScrapeMe website as a reference.

ScrapeMe Webpage and Source
Click to open the image in full screen

1. ZenRows Web Scraper API

ZenRows API is the best C# web scraping library on this list. It's an API that handles anti-bot bypass from rotating proxies and headless browsers to CAPTCHAs. Additionally, it supports auto-parsing (i.e., HTML to JSON parsing) for many popular sites and can extract dynamic content.

The only downside is that ZenRows has no proprietary C# Nuget package to send HTTP requests to its Web Scraping API. Therefore you need to use additional packages to send HTTP requests.

How to Scrape a Web Page in C# with ZenRows?

Create a free account on ZenRows to get your API key. You'll get to the following Request Builder screen.

ZenRows Dashboard
Click to open the image in full screen

After you add the URL you want to crawl, send an HTTP GET request to it. That generates a plain HTML that can be extracted using any HTML parser.

ZenRows Generating API URL
Click to open the image in full screen

If you used https://scrapeme.live/shop as the URL to scrape, the ZenRows API URL should look like this.

Terminal
https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fscrapeme.live%2Fshop

Remark: the API Key is a personal ID assigned to you by ZenRows and shouldn't be shared with anyone.

Since ZenRows doesn't provide any explicit Nuget package to be used with a C# program, you'll have to send an HTTP GET request to the API URL. ZenRows will scrape the target URL on your behalf and return the plain HTML in the response.

To do this, create a C# Console Application in Visual Studio and add the following code to the program's main() function.

program.cs
var url = "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fscrapeme.live%2Fshop"; 
var req = WebRequest.Create(url); 
req.Method = "GET";

This code will create a new WebRequest object with the ZenRows API URL as a target. You can then send the request and get the response in the HttpResponse object.

program.cs
using var webResponse = request.GetResponse(); 
using var webStream = webResponse.GetResponseStream();

The plain HTML response is fetched in the webResponse object, then converted into a bytes stream for easy data reading. Now that you have the plain HTML of the target page in the stream, you can use any HTML parser to parse the desired elements.

Let's use HTML Agility Pack to extract the products' names and prices.

Install the Agility Pack and write the ParseHtml() function in the Program class to parse and print the product prices and names.

program.cs
private static void ParseHtml(Stream html) 
{ 
	var doc = new HtmlDocument(); 
	doc.Load(html); 
 
	HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2"); 
	HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span"); 
 
	for (int i = 0; i < names.Count; i++) 
	{ 
		Console.WriteLine("Name: {0}, Price: {1}", names[i].InnerText, prices[i].InnerText); 
	} 
}

The ParseHtml() function creates an HtmlDocument instance doc and loads the passed bytes stream in it. Then it uses the SelectNodes() method to parse all the elements with product names and prices.

The SelectNodes() method takes XPath to extract the elements from an HTML document. The XPath "//a/h2" selects all <h2> elements enclosed in an anchor tag <a>, which contains names for products.

Similarly, the XPath "//div/main/ul/li/a/span" references all the <span> elements containing product prices. The for loop prints InnerTexts of these parsed elements. Let's call the ParseHTML() method from the main() function with the webStream bytes stream as an argument.

program.cs
ParseHtml(webStream);

Congratulations! ๐Ÿ‘

You have just crawled a web page using the ZenRows C# web scraping library without being blocked by any anti-bot.

ZenRows Output
Click to open the image in full screen

Here's what the complete code looks like.

program.cs
using HtmlAgilityPack; 
using System; 
using System.IO; 
using System.Net; 
 
namespace ZenRowsDemo 
{ 
	class Program 
	{ 
		static void Main(string[] args) 
		{ 
			var url = "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fscrapeme.live%2Fshop";
			var request = WebRequest.Create(url); 
			request.Method = "GET"; 
			using var webResponse = request.GetResponse(); 
			using var webStream = webResponse.GetResponseStream(); 
			ParseHtml(webStream); 
		} 
 
		private static void ParseHtml(Stream html) 
		{ 
			var doc = new HtmlDocument(); 
			doc.Load(html); 
 
			HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2"); 
			HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span"); 
 
			for (int i = 0; i < names.Count; i++) 
			{ 
				Console.WriteLine("Name: {0}, Price: {1}", names[i].InnerText, prices[i].InnerText); 
			} 
		} 
	} 
}

2. Puppeteer-Sharp

Puppeteer-Sharp is a C# Web scraping library that crawls a web page using a headless browser. Some benefits of using Puppeteer-Sharp include scraping dynamic web pages, supporting headless browsers, and generating PDFs and screenshots of web pages.

There are some downsides: it requires manual proxy integrations, doesn't provide anti-bot protections, and you must manually monitor your infrastructure scalability needs.

Let's take a look at how to crawl a web page using this library.

How to Scrape a Web Page in C# with Puppeteer-Sharp?

Create a C# Console Application in Visual Studio (or the IDE you prefer) and then install the PuppeteerSharp package through the NuGet Package Manager, as shown below.

Puppeteer Setup
Click to open the image in full screen

Repeat the same procedure to install the AngelSharp library. That will be handy for parsing the data crawled using the PuppeteerSharp package.

Once that's done, let's go ahead and do some C# web scraping.

The first step is to include the required library files in your Program.cs file.

program.cs
using PuppeteerSharp; 
using AngleSharp; 
using AngleSharp.Dom;

Once that's done, launch a headless Chrome instance using Puppeteer and fetch content from the same page we've just used.

program.cs
// download the browser executable
await new BrowserFetcher().DownloadAsync();
var browser = await Puppeteer.LaunchAsync(new LaunchOptions 
{ 
	Headless = true
}); 
var page = await browser.NewPageAsync(); 
await page.GoToAsync("https://scrapeme.live/shop/"); 

var content = await page.GetContentAsync(); 

The code launches a headless Chrome browser instance.

The GoToAsync() function navigates the browser to the given URL. The GetContentAsync() method retrieves the current page's raw HTML content.

Let's use AngelSharp to parse the raw HTML contents to extract the product names and prices. The AngelSharp library uses LINQ queries to take the path of the elements you want to extract from the raw content.

program.cs
var jsSelectAllNames = @"Array.from(document.querySelectorAll('h2')).map(a => a.innerText);"; 
 
var jsSelectAllPrices = @"Array.from(document.querySelectorAll('span[class=""price""]')).map(a => a.innerText);"; 
 
var names = await page.EvaluateExpressionAsync<string[]>(jsSelectAllNames); 
 
var prices = await page.EvaluateExpressionAsync<string[]>(jsSelectAllPrices); 
 
for (int i=0; i < names.Length; i++) 
{ 
	Console.WriteLine("Name: {0}, Price: {1}", names[i], prices[i]); 
}

The querySelectorAll() function defines the path of the desired elements. The map() function is a filter method to iterate over each instance of the given element array to retain only the intertexts.

The EvaluateExpressionAsync() method evaluates the expressions to extract the products' names and prices in relevant string arrays. It evaluates the expressions, then it extracts and returns the matching results.

And there you have it, a web page successfully scraped with Puppeteer C# crawling library.

Puppeteer Scraper Output
Click to open the image in full screen

Here's what the full code looks like.

program.cs
using PuppeteerSharp; 
using System; 
using System.Threading.Tasks; 
using AngleSharp; 
using AngleSharp.Dom; 
using System.Linq; 
using System.Collections.Generic; 
 
namespace PuppeteerDemo 
{ 
	class Program 
	{ 
		static async Task Main(string[] args) 
		{ 
			// download the browser executable
            await new BrowserFetcher().DownloadAsync();
			var browser = await Puppeteer.LaunchAsync(new LaunchOptions 
			{ 
				Headless = true
			}); 
			var page = await browser.NewPageAsync(); 
			await page.GoToAsync("https://scrapeme.live/shop/"); 
 
			var content = await page.GetContentAsync(); 
 
			List<Products> products = new List<Products>(); 
 
			var jsSelectAllNames = @"Array.from(document.querySelectorAll('h2')).map(a => a.innerText);"; 
 
			var jsSelectAllPrices = @"Array.from(document.querySelectorAll('span[class=""price""]')).map(a => a.innerText);"; 
 
			var names = await page.EvaluateExpressionAsync<string[]>(jsSelectAllNames); 
			var prices = await page.EvaluateExpressionAsync<string[]>(jsSelectAllPrices); 
			for (int i=0; i < names.Length && i < prices.Length; i++) 
			{ 
				Products p = new Products(); 
				p.Name = names[i]; 
				p.Price = prices[i]; 
				products.Add(p); 
			} 
			foreach(var p in products) 
			{ 
				Console.WriteLine("Name: {0}, Price: {1}", p.Name, p.Price); 
			} 
		} 
	} 
	class Products 
	{ 
		private string name; 
		private string price; 
 
		public string Name { get => name; set => name = value; } 
		public string Price { get => price; set => price = value; } 
	} 
}
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

3. Selenium WebDriver

Selenium is one of the most used tools for crawling large amounts of data, like photos, links and text. It's ideal for crawling dynamic web pages because of its ability to handle dynamic content produced using JavaScript.

The downside to the Selenium C# scraping library is that it requires manual proxy integrations and anti-bots mechanisms.

How to Scrape a Web Page in C# with Selenium?

Building a web crawler with Selenium requires two external packages: Selenium WebDriver and Selenium WebDriver.ChromeDriver. You can install these packages using the NuGet Package manager.

Selenium Setup
Click to open the image in full screen

Once you have those installed, launch the headless Chrome driver by specifying its multiple options (e.g., browser path, GPU support, etc.).

program.cs
string fullUrl = "https://scrapeme.live/shop/"; 
var options = new ChromeOptions(); 

options.AddArguments(new List<string>() { "headless", "disable-gpu" }); 
 
var browser = new ChromeDriver(options); 
browser.Navigate().GoToUrl(fullUrl);

After the browser navigates to the desired URL, extract the elements using the FindElements function.

program.cs
var names = browser.FindElements(By.TagName("h2")); 
var prices = browser.FindElements(By.CssSelector("span.price")); 
 
for (int i = 0; i < names.Count && i < prices.Count; i++) 
{ 
	Console.WriteLine("Name: {0}, Price: {1}", names[i], prices[i]); 
}

And that's it! Here's what your output will look like.

Selenium Scraper Output
Click to open the image in full screen

If you got lost along the line, here's the complete code used.

program.cs
using OpenQA.Selenium.Chrome; 
using System; 
using OpenQA.Selenium; 
using System.Collections.Generic; 
using System.Threading.Tasks; 
 
namespace SeleniumDemo { 
	class Program { 
		static void Main(string[] args) { 
			string fullUrl = "https://scrapeme.live/shop/"; 
			List<string> programmerLinks = new List<string>(); 
 
			var options = new ChromeOptions(); 
 
			options.AddArguments(new List<string>() { 
				"headless", 
				"disable-gpu" 
			}); 
 
			var browser = new ChromeDriver(options); 
			List<Products> products = new List<Products>(); 
			browser.Navigate().GoToUrl(fullUrl); 
 
			var names = browser.FindElements(By.TagName("h2")); 
			var prices = browser.FindElements(By.CssSelector("span.price")); 
			for (int i = 0; i < names.Count && i < prices.Count; i++) { 
				Products p = new Products(); 
				p.Name = names[i].GetAttribute("innerText"); 
				p.Price = prices[i].GetAttribute("innerText"); 
				products.Add(p); 
			} 
			foreach(var p in products) { 
				Console.WriteLine("Name: {0}, Price: {1}", p.Name, p.Price); 
			} 
		} 
	} 
	class Products { 
		string name; 
		string price; 
 
		public string Name { 
			get => name; 
			set => name = value; 
		} 
		public string Price { 
			get => price; 
			set => price = value; 
		} 
	} 
}

4. HTML Agility Pack

HTML Agility Pack is the most downloaded C# DOM scraper library because it can download web pages directly or through a browser. The HTML Agility package can tackle broken HTML, support XPath, and scan local HTML files.

That said, this C# web crawling library lacks headless scraping support and needs external proxy services to bypass anti-bots.

How to Scrape a Web Page in C# with HTML Agility Pack?

Create a C# console application and open the NuGet Package Manager from the Tools menu. Type HTML Agility Pack in the search bar under the browse tab, select the appropriate "HtmlAgilityPack" version, and click "Install".

Install HTML Agility Pack
Click to open the image in full screen

Once the tools are installed, let's extract some data. The first step is to include the required library files in your Program.cs file.

program.cs
using HtmlAgilityPack;

Then create a helper method with GetDocument() to load the HTML of the URL and return the HTML contents.

program.cs
static HtmlDocument GetDocument(string url) 
{ 
	HtmlWeb web = new HtmlWeb(); 
	HtmlDocument doc = web.Load(url); 
	return doc; 
}

You can use GetDocument() to retrieve the HTML contents of any target page. The last step is to write the main() driver code like this.

program.cs
var doc = GetDocument("https://scrapeme.live/shop/"); 
 
HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2"); 
HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span"); 
 
for (int i = 0; i < names.Count(); i++){ 
	Console.WriteLine("Name: {0}, Price: {1}", names[i].InnerText, prices[i].InnerText); 
} 

The GetDocument() function retrieves the HTML contents of the given target. The SelectNodes() function uses the relevant XPath selectors to parse the names and prices of the products on the target web page. The for loop prints all the InnerText of the parsed elements.

That's it! ScrapeMe has just been crawled using the HTML Agility Pack C# web crawling library.

HTML Agility Scraper Output
Click to open the image in full screen

Here's what the full code looks like.

program.cs
using HtmlAgilityPack; 
using System; 
using System.Collections.Generic; 
using System.Linq; 
 
namespace HAPDemo 
{ 
	class Program 
	{ 
		static void Main(string[] args) 
		{ 
			var doc = GetDocument("https://scrapeme.live/shop/"); 
 
			HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2"); 
			HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span"); 
			List<Products> products = new List<Products>(); 
 
			for (int i = 0; i < names.Count && i < prices.Count; i++) 
			{ 
				Products p = new Products(); 
				p.Name= names[i].InnerText; 
				p.Price = prices[i].InnerText; 
				products.Add(p); 
			} 
			foreach (var p in products) 
			{ 
				Console.WriteLine("Name: {0} , Price: {1}", p.Name, p.Price); 
			} 
		} 
		static HtmlDocument GetDocument(string url) 
		{ 
			HtmlWeb web = new HtmlWeb(); 
			HtmlDocument doc = web.Load(url); 
			return doc; 
		} 
	} 
 
	class Products 
	{ 
		private string name; 
		private string price; 
 
		public string Name { get => name; set => name = value; } 
		public string Price { get => price; set => price = value; } 
	} 
}

5. ScrapySharp

ScrapySharp is an open-source C# web crawling library that combines the HTMLAgilityPack extension with a web client that can emulate a web browser like jQuery.

It significantly reduces the setup work often associated with scraping a web page, and its combination with HTML Agility Pack lets you easily access the retrieved HTML content.ย 

ScrapySharp can simulate a web browser and handle cookie tracking, redirects, and other high-level operations.

The downsides of using the ScrapySharp C# scraping library are that it requires proxies and anti-bots and doesn't support automatic parsing of the crawled content.

How to Scrape a Web Page in C# with ScrapySharp?

Create a C# console application project and install the latest ScrapySharp package through NuGet Package Manager. Open the Program.cs file from your console application and include the following required libraries.

program.cs
using HtmlAgilityPack; 
using ScrapySharp.Network; 
using System; 
using System.Linq;

Next, create a static ScrapingBrowser object in the Program class. You can use the functionalities of this object, like mybrowser, to navigate and crawl the target URLs.

program.cs
static ScrapingBrowser mybrowser = new ScrapingBrowser();

Create a helper method to retrieve and return the HTML content of the given target URL.

program.cs
static HtmlNode GetHtml(string url) 
{ 
	WebPage webpage = mybrowser.NavigateToPage(new Uri(url)); 
	return webpage.Html; 
}

The GetHtml helper method takes the target URL as a parameter and sends it to the NavigateToPage() method of the mybrowser object, returning the webpage's HTML.

Let's create one more helper method to extract and print the product names and prices.

program.cs
static void ScrapeNamesAndPrices(string url) 
{ 
	var html = GetHtml(url); 
 
	var nameNodes = html.OwnerDocument.DocumentNode.SelectNodes("//a/h2"); 
	var priceNodes = html.OwnerDocument.DocumentNode.SelectNodes("//div/main/ul/li/a/span"); 
 
	foreach (var (n, p) in nameNodes.Zip(priceNodes)) 
	{ 
		Console.WriteLine("Name: {0} , Price: {1}", n.InnerText, p.InnerText); 
	} 
}

This snippet uses the GetHtml() method to retrieve the URL's HTML and parses it using the SelectNodes() function. The SelectNodes() function uses the XPaths of the product names & prices and returns all the elements with the same XPath.

The foreach loop gets the InnerText for each element in the nameNode and priceNode collections and prints them on the console. For a final touch, add a driver code to put everything in order.

static void Main(string[] args) 
{ 
	ScrapeNamesAndPrices("https://scrapeme.live/shop/"); 
}

And there you have it.

ScrapySharp Scraper Output
Click to open the image in full screen

Here's what the final code looks like.

program.cs
using HtmlAgilityPack; 
using ScrapySharp.Network; 
using System; 
using System.Linq; 
namespace ScrapyDemo 
{ 
	class Program 
	{ 
		static ScrapingBrowser mybrowser = new ScrapingBrowser(); 
		static void Main(string[] args) 
		{ 
			ScrapeNamesAndPrices("https://scrapeme.live/shop/"); 
		} 
		static HtmlNode GetHtml(string url) 
		{ 
			WebPage webpage = mybrowser.NavigateToPage(new Uri(url)); 
			return webpage.Html; 
		} 
		static void ScrapeNamesAndPrices(string url) 
		{ 
			var html = GetHtml(url); 
			var nameNodes = html.OwnerDocument.DocumentNode.SelectNodes("//a/h2"); 
			var priceNodes = html.OwnerDocument.DocumentNode.SelectNodes("//div/main/ul/li/a/span"); 
 
			foreach (var (n, p) in nameNodes.Zip(priceNodes)) 
			{ 
				Console.WriteLine("Name: {0} , Price: {1}", n.InnerText, p.InnerText); 
			} 
		} 
	} 
}

6. IronWebScraper

The IronWebScraper is a .Net Core C# web scraping library used to extract and parse data from internet sources. It can control permitted and disallowed objects, sites, media and other elements.

Other features include its ability to manage numerous identities and web cache.

The main drawbacks of using this C# web scraper library are as follows:

  • It doesn't support crawling dynamic content.
  • Requires manual proxy integrations.

How to Scrape a Web Page in C# with IronWebScraper?

Setting up the development environment for this C# crawler library is pretty simple. Install the IronWebScraper package to your C# Console Project, and you're good to go!

IronWebScraper Setup
Click to open the image in full screen

The WebScraper class in IronWebScraper has two abstract methods: Init and Parse. The Init method initializes the web request and fetches the response. This response is then passed to the Parse function to extract the required elements from the HTML content.

To create your IronWebScraper, you must inherit the WebScraper class and override these two abstract methods. If you have IronScraperDemo as your main scraper class in the Program.cs file, implement the WebScraper class like this.

program.cs
class IronScraperDemo : WebScraper 
{ 
	public override void Init() 
	{ 
		License.LicenseKey = "ENTER YOUR LICENSE KEY HERE"; 
		this.LoggingLevel = WebScraper.LogLevel.All; 
		this.Request("https://scrapeme.live/shop/", Parse); 
	} 
 
	public override void Parse(Response response) 
	{ 
		var names = response.Css("h2"); 
		var prices = response.Css("span.price"); 
		for(int i=0; i<names.Length;i++) 
		{ 
			Console.WriteLine("Name: {0}, Price: {1}", names[i].InnerText, prices[i].InnerText); 
		} 
	} 
}

The Init method requires you to add a License Key, which you can get by creating an account on their website. The Init function further calls the Parse method on the received response after requesting the target URL.

In our case, the Parse method uses the CSS selectors to extract the product names and prices from the response and prints them out on the output console.

Congratulations! Your IronWebScraper is ready to work. Just add the following driver code to create an object of your crawler and call its Start() method.

program.cs
static void Main(string[] args) 
{ 
	IronScraperDemo ironScraper = new IronScraperDemo(); 
	ironScraper.Start(); 
}

Using the IronWebScraper to crawl the target webpage, your output should look like this.

IronWebScraper Output
Click to open the image in full screen

Here's the complete Program.cs code for this example.

program.cs
using IronWebScraper; 
using System; 
using System.Collections.Generic; 
 
namespace IronScrapDemo 
{ 
	class IronScraperDemo : WebScraper 
	{ 
		List<Products> products = new List<Products>(); 
		static void Main(string[] args) 
		{ 
			IronScraperDemo ironScraper = new IronScraperDemo(); 
			ironScraper.Start(); 
		} 
		public override void Init() 
		{ 
			License.LicenseKey = "ENTER YOUR LICENSE KEY HERE"; 
			this.LoggingLevel = WebScraper.LogLevel.All; // All Events Are Logged 
			this.Request("https://scrapeme.live/shop/", Parse); 
		} 
 
		public override void Parse(Response response) 
		{ 
			var names = response.Css("h2"); 
			var prices = response.Css("span.price"); 
			for (int i=0; i < names.Length; i++) 
			{ 
				Products p = new Products(); 
				p.Name = names[i].InnerText; 
				p.Price = prices[i].InnerText; 
				products.Add(p); 
			} 
			foreach(var p in products) 
			{ 
				Console.WriteLine("Name: {0}, Price: {1}", p.Name, p.Price); 
			} 
		} 
	} 
	class Products 
	{ 
		public String Name 
		{ 
			get; 
			set; 
		} 
		public String Price 
		{ 
			get; 
			set; 
		} 
	} 
}

Remark: you still need to set up the Development Environment as per the instructions in this section.

7. HttpClient

HttpClient is a C# HTML scraper library that provides async features to extract only the raw HTML contents from a target URL. However, you still need to use an HTML parsing tool to extract the desired data.

How to Scrape a Web Page in C# with HttpClient?

The HttpClient is a primitive .NET class and doesn't require any external assembly, but you have to install the HTML Agility Pack as an external dependency through the Package Manager.

To get started, include the following assemblies in the "Program.cs" file under the C# Console Application.

program.cs
using System; 
using System.Threading.Tasks; 
using System.Net.Http; 
using HtmlAgilityPack;

The System.Threading.Tasks class helps handle the asynchronous functions. Importing the System.Net.Http is needed to generate the HTTP requests, and the HTMLAgilityPack is used to parse the retrieved HTML content.

Add a GetHtmlContent() method in the Program class, like this.

program.cs
private static Task<string> GetHtmlContent() 
{ 
	var hclient = new HttpClient(); 
	return hclient.GetStringAsync("https://scrapeme.live/shop"); 
}

You can pass this string response to the ParseHTML() method to extract and show the desired data.

program.cs
 private static void ParseHtml(string html) 
{ 
	var doc = new HtmlDocument(); 
	doc.LoadHtml(html); 
 
	HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2"); 
 
	HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span"); 
 
	for (int i = 0; i < names.Count; i++){ 
		Console.WriteLine("Name: {0}, Price: {1}", names[i].InnerText, prices[i].InnerText); 
	} 
}

The above ParseHTML takes an HTML string as input, parses it using the SelectNodes() method, and displays the names and prices on the output console.ย 

Here the SelectNodes() function plays the key parsing role. It extracts only the relevant elements from the HTML string according to the given XPath selectors.

To wrap it off, let's look at the driver code to execute everything in order.

program.cs
static async Task Main(string[] args) 
{ 
	var html = await GetHtmlContent(); 
	ParseHtml(html); 
}

Notice that the main() function is now an async method. The reason is that every awaitable call can only be enclosed in an async method; thereby, the main() function has to be asynchronous.

And like the other C# web scraping libraries discussed in this article, your output will be the following. Check our guide on using a proxy with HttpClient in C# for more.

HttpClient Scraper Output
Click to open the image in full screen

Here's what all the code chunks joined together look like.

program.cs
using System; 
using System.Threading.Tasks; 
using System.Collections.Generic; 
using System.Net.Http; 
using System.Threading.Tasks; 
using HtmlAgilityPack; 
namespace HTTPDemo 
{ 
	class Program 
	{ 
		static async Task Main(string[] args) 
		{ 
			var html = await GetHtmlContent(); 
			List<Products> products = ParseHtml(html); 
			foreach (var p in products) 
			{ 
				Console.WriteLine("Name: {0} , Price: {1}", p.Name, p.Price); 
			} 
		} 
 
		private static Task<string> GetHtmlContent() 
		{ 
			var hclient = new HttpClient(); 
			return hclient.GetStringAsync("https://scrapeme.live/shop"); 
		} 
 
		private static List<Products> ParseHtml(string html) 
		{ 
			var doc = new HtmlDocument(); 
			doc.LoadHtml(html); 
 
			HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2"); 
			HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span"); 
 
			List<Products> products = new List<Products>(); 
 
			for (int i = 0; i < names.Count && i < prices.Count; i++) 
			{ 
				Products p = new Products(); 
				p.Name = names[i].InnerText; 
				p.Price = prices[i].InnerText; 
				products.Add(p); 
			} 
 
			return products; 
		} 
	} 
	class Products 
	{ 
		public String Name 
		{ 
			get; 
			set; 
		} 
		public String Price 
		{ 
			get; 
			set; 
		} 
	} 
}

Conclusion

We did the deep test and presented the seven best C# web scraping libraries in 2024. As seen, they are:

  1. ZenRows.
  2. Puppeteer-Sharp.
  3. Selenium.
  4. HTML Agility Pack.
  5. ScrapySharp
  6. IronWebScraper.
  7. HttpClient.

A common challenge among scrapers is their inability to crawl a web page without triggering anti-bots. ZenRows solves this scraping problem by handling all anti-bot bypass for you, removing the headaches involved. You can test ZenRows for free to see what it's capable of.

Frequent Questions

What Is the Best C# Web Scraping Library?

ZenRows is the best option for web scraping in C#. Unlike other C# web scraping libraries, it doesn't require manual integration of proxies or setting up anti-bot logic.

It further supports scraping dynamic content with AutoParse features.

HTML Agility Pack is the most popular library, with around 83 million downloads. Its popularity is mainly due to its HTML parser, which can download web pages directly or through a browser.

What Is a Good C# Web Scraping Library?

A good scraping library provides efficient mechanisms to scrape and parse the contents from the targets, avoiding any blockings (including anti-bots/captchas) or revealing your IP.ย 

Additionally, it should support scraping dynamic content and should have comprehensive documentation and active community support. ZenRows, Puppeteer-Sharp, and Selenium are good options for web scraping using C#.

Which Libraries Are Used for Web Scraping in C#?

There are different libraries used for web scraping using C#. The best ones are:

  1. ZenRows Web Scraper API.
  2. Puppeteer-Sharp.
  3. Selenium WebDriver.
  4. HTML Agility Pack.
  5. ScrapySharp.
  6. IronWebScraper.
  7. HttpClient.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.