7 Best C# Web Scraping Libraries in 2023
There are different C# Web Scraping Libraries to extract data, including purposes like price tracking, lead generation, sentiment monitoring, financial data aggregation and so on.
There are different metrics to consider when selecting the best library for scraping and, in this article, we'll be discussing the 7 best C# web scraping libraries to use in 2023. Also, we'll see examples to help you understand how these frameworks work.
Let's dive in!
What are the Best C# Web Scraping Libraries?
- ZenRows.
- Puppeteer Sharp.
- Selenium.
- HTML Agility Pack.
- Scrapy Sharp.
- Iron web scraper.
- HttpClient.
The C# libraries were compared based on core features that make web scraping smooth, like proxy configurations, dynamic content, documentation, anti-bots bypass, auto parse and infrastructure scalability. This is a quick comparison:
ZenRows | Puppeteer | Selenium | HTML Agility Pack | Scrapy | Iron Web Scraper | HttpClient | |
---|---|---|---|---|---|---|---|
Proxy Configuration | AUTO | MANUAL | MANUAL | MANUAL | MANUAL | MANUAL | MANUAL |
Dynamic Content | - | - | - | - | |||
Documentation | |||||||
Anti-Bots Bypass | - | - | - | - | - | - | |
Auto-Parse HTML | - | - | - | - | - | - | |
Infrastructure Scalability | AUTO | MANUAL | MANUAL | MANUAL | MANUAL | MANUAL | MANUAL |
Let's go ahead and talk about these C# crawler libraries based on the features they come with and how they can be used to extract data from a web page. We'll use the ScrapeMe website as a reference.

1. ZenRows Web Scraper API
ZenRows API is the best C# web scraping library on this list. It's an API that handles anti-bot bypass from rotating proxies and headless browsers to CAPTCHAs. Additionally, it supports auto-parsing (i.e. HTML to JSON parsing) for many popular sites, and it can extract dynamic content.
The only downside is that ZenRows doesn't have any proprietary C# Nuget package to send HTTP requests to its Web Scraping API. Therefore you need to use additional packages to send HTTP requests.
How to Scrape a Web Page in C# with ZenRows
Create a free account on ZenRows to get your API key. You'll get to the following Request Builder screen:

After you add the URL you want to cawl, send an HTTP GET request to it. This generates a plain HTML that can be extracted using any HTML parser.

If you used https://scrapeme.live/shop
as the URL to Scrape, the ZenRows API URL should look like this:
https://api.zenrows.com/v1/?apikey=API_KEY&url=https%3A%2F%2Fscrapeme.live%2Fshop
Remark the API Key is a personal ID assigned to you by ZenRows and shouldn't be shared with anyone.
Since ZenRows doesn't provide any explicit Nuget package to be used with a C# program, you'll have to send a HTTP GET
request to the API URL. ZenRows will scrape the target URL on your behalf and return the plain HTML in the response.
To do this, create a C# Console Application in Visual Studio and add the following code to the program's main()
function:
var url = "https://api.zenrows.com/v1/?apikey=API_KEY&url=https%3A%2F%2Fscrapeme.live%2Fshop";
var req = WebRequest.Create(url);
req.Method = "GET";
This code will create a new WebRequest
object with ZenRows API URL as a target. You can then send the request and get the response in the HttpResponse
object:
using var resp = req.GetResponse();
using var webStream = resp.GetResponseStream();
The plain HTML response is fetched in the resp
object, then converted into a bytes stream for easy data reading. Now that you have the plain HTML of the target page in the stream, you can use any HTML parser to parse the desired elements.
Let's use HTML Agility Pack
to extract the products' names and prices.
Install the Agility Pack and write the ParseHtml()
function in the Program class to parse and print the product prices and names.
private static void ParseHtml(Stream html)
{
var doc = new HtmlDocument();
doc.Load(html);
HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2");
HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span");
for (int i = 0; i < names.Count; i++)
{
Console.WriteLine("Name: {0}, Price: {1}", names[i].InnerText, prices[i].InnerText);
}
}
The ParseHtml()
function creates an HtmlDocument
instance doc
and loads the passed bytes stream in it. Then it uses the SelectNodes()
method to parse all the elements with product names and prices.
The SelectNodes()
method takes XPath
to extract the elements from an HTML document. The XPath "//a/h2"
selects all <h2>
elements enclosed in an anchor tag <a>
which contain names for products.
Similarly, the XPath "//div/main/ul/li/a/span"
references all the <span>
elements containing product prices. The for
loop prints InnerTexts
of these parsed elements. Let's call ParseHTML()
method from the main()
function with the webStream
bytes stream as an argument.
ParseHtml(webStream);
Congratulations! 👏
You have just crawled a web page using the ZenRows C# web scraping library… without being blocked by any antibot.

Here's what the full code looks like:
using HtmlAgilityPack;
using System;
using System.IO;
using System.Net;
namespace ZenRowsDemo
{
class Program
{
static void Main(string[] args)
{
var url = "https://api.zenrows.com/v1/?apikey=PUT_YOUR_ZENROWS_API_KEY_HERE&url=https%3A%2F%2Fscrapeme.live%2Fshop";
var request = WebRequest.Create(url);
request.Method = "GET";
using var webResponse = request.GetResponse();
using var webStream = webResponse.GetResponseStream();
ParseHtml(webStream);
}
private static void ParseHtml(Stream html)
{
var doc = new HtmlDocument();
doc.Load(html);
HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2");
HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span");
for (int i = 0; i < names.Count; i++)
{
Console.WriteLine("Name: {0}, Price: {1}", names[i].InnerText, prices[i].InnerText);
}
}
}
}
2. Puppeteer Sharp
Puppeteer Sharp is a C# Web scraping library that crawls a web page using a headless browser. There are some benefits to using puppeteer sharp, including the ability to scrape dynamic web pages, support headless browsers and it can generate PDFs and screenshots of web pages.
There are some downsides: it requires manual proxy integrations, it doesn't provide anti-bot protections and you have to manually monitor your infrastructure scalability needs
Let's take a look at how easy it's to crawl a web page using this library.
How to Scrape a Web Page in C# with Puppeteer Sharp
Create a C# Console Application in Visual Studio (or the IDE you prefer) and then install the PuppeteerSharp
package through the NuGet Package Manager, as shown:

Repeat the same procedure to install the AngelSharp
library. This will be handy to parse the data crawled using the PuppeteerSharp
package.
Once that's done, let's go ahead and do some C# web scraping.
The first step is to include the required library files in your Program.cs
file.
using PuppeteerSharp;
using AngleSharp;
using AngleSharp.Dom;
Once that's done, launch a headless Chrome instance using Puppeteer and fetch content from the same page we've just used:
using var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync(BrowserFetcher.DefaultChromiumRevision);
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
ExecutablePath= @"C:\Program Files\Google\Chrome\Application\chrome.exe"
});
var page = await browser.NewPageAsync();
await page.GoToAsync("https://scrapeme.live/shop/");
var content = await page.GetContentAsync();
The code launches a headless Chrome browser instance. The LaunchAsync
function requires the path to the Chrome browser installation directory, which path can be different for you, so make sure to provide the correct path.
The GoToAsync()
function navigates the browser to the given URL. The GetContentAsync()
method is used to retrieve the raw HTML content of the current page.
Let's use AngelSharp
to parse the raw HTML contents to extract the product names and prices. The AngelSharp
library uses LINQ queries to take the path of the elements you want to extract from the raw content.
string jsSelectAllNames = @"Array.from(document.querySelectorAll('h2')).map(a => a.innerText);";
string jsSelectAllPrices = @"Array.from(document.querySelectorAll('span[class=""price""]')).map(a => a.innerText);";
var names = await page.EvaluateExpressionAsync<string[]>(jsSelectAllNames);
var prices = await page.EvaluateExpressionAsync<string[]>(jsSelectAllPrices);
for (int i=0; i < names.Length; i++)
{
Console.WriteLine("Name: {0}, Price: {1}", names[i], prices[i]);
}
The querySelectorAll()
function is used to define the path of the desired elements, and the map()
function is a filter method to iterate over each instance of the given element array to retain only the intertexts.
The EvaluateExpressionAsync()
method evaluates the expressions to extract the products' names and prices in relevant string arrays. It evaluates the expressions, then it extracts and returns the matching results.
And there you have it, a web page successfully scraped with Puppeteer C# crawling library:

Here's what the full code looks like:
using PuppeteerSharp;
using System;
using System.Threading.Tasks;
using AngleSharp;
using AngleSharp.Dom;
using System.Linq;
using System.Collections.Generic;
namespace PuppeteerDemo
{
class Program
{
static async Task Main(string[] args)
{
using var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync(BrowserFetcher.DefaultChromiumRevision);
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
ExecutablePath= @"C:\Program Files\Google\Chrome\Application\chrome.exe"
});
var page = await browser.NewPageAsync();
await page.GoToAsync("https://scrapeme.live/shop/");
var content = await page.GetContentAsync();
List<Products> products = new List<Products>();
var jsSelectAllNames = @"Array.from(document.querySelectorAll('h2')).map(a => a.innerText);";
var jsSelectAllPrices = @"Array.from(document.querySelectorAll('span[class=""price""]')).map(a => a.innerText);";
var names = await page.EvaluateExpressionAsync<string[]>(jsSelectAllNames);
var prices = await page.EvaluateExpressionAsync<string[]>(jsSelectAllPrices);
for (int i=0; i < names.Length && i < prices.Length; i++)
{
Products p = new Products();
p.Name = names[i];
p.Price = prices[i];
products.Add(p);
}
foreach(var p in products)
{
Console.WriteLine("Name: {0}, Price: {1}", p.Name, p.Price);
}
}
}
class Products
{
private string name;
private string price;
public string Name { get => name; set => name = value; }
public string Price { get => price; set => price = value; }
}
}
3. Selenium Web Driver
Selenium is one of the most used tools for crawling large amounts of data, like photos, links and text. It's ideal for crawling dynamic web pages because of its ability to handle dynamic content produced using JavaScript.
The downside to the Selenium C# scraping library is that it requires manual proxy integrations and anti-bots mechanisms.
How to Scrape a Web Page in C# with Selenium
Building a web crawler with Selenium requires two external packages: Selenium Web Driver
and Selenium Webdriver.ChromeDriver
. You can install these packages using the NuGet Package manager:

Once you have those installed, launch the headless Chrome Driver by specifying its multiple options (e.g. the path of the browser, GPU support, etc.).
string fullUrl = "https://scrapeme.live/shop/";
var options = new ChromeOptions(){
BinaryLocation = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
};
options.AddArguments(new List<string>() { "headless", "disable-gpu" });
var browser = new ChromeDriver(options);
browser.Navigate().GoToUrl(fullUrl);
After the browser navigates to the desired URL, extract the elements using the FindElements
function.
var names = browser.FindElements(By.TagName("h2"));
var prices = browser.FindElements(By.CssSelector("span.price"));
for (int i = 0; i < names.Count && i < prices.Count; i++)
{
Console.WriteLine("Name: {0}, Price: {1}", names[i], prices[i]);
}
And that's it! Here's what your output will look like:

If you got lost along the line, here's the complete code used:
using OpenQA.Selenium.Chrome;
using System;
using OpenQA.Selenium;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace SeleniumDemo {
class Program {
static void Main(string[] args) {
string fullUrl = "https://scrapeme.live/shop/";
List<string> programmerLinks = new List<string>();
var options = new ChromeOptions() {
BinaryLocation = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
};
options.AddArguments(new List<string>() {
"headless",
"disable-gpu"
});
var browser = new ChromeDriver(options);
List<Products> products = new List<Products>();
browser.Navigate().GoToUrl(fullUrl);
var names = browser.FindElements(By.TagName("h2"));
var prices = browser.FindElements(By.CssSelector("span.price"));
for (int i = 0; i < names.Count && i < prices.Count; i++) {
Products p = new Products();
p.Name = names[i].GetAttribute("innerText");
p.Price = prices[i].GetAttribute("innerText");
products.Add(p);
}
foreach(var p in products) {
Console.WriteLine("Name: {0}, Price: {1}", p.Name, p.Price);
}
}
}
class Products {
string name;
string price;
public string Name {
get => name;
set => name = value;
}
public string Price {
get => price;
set => price = value;
}
}
}
4. HTML Agility Pack
HTML Agility Pack is the most downloaded C# DOM scraper library due to its ability to download web pages directly or through a browser. HTML Agility package can tackle broken HTML and supports XPath, and it can scan local HTML files as well.
That said, this C# web crawling library has no support for headless scraping and it needs external proxy services to bypass anti-bots.
How to Scrape a Web Page in C# with HTML Agility Pack
Create a C# console application and open the NuGet Package Manager from the Tools menu. Type HTML Agility Pack
in the search bar under the browse tab, select the appropriate version of the HtmlAgilityPack, and then click install.

Once we've got the tools installed, let's go extract some data. The first step is to include the required library files in your Program.cs
file:
using HtmlAgilityPack;
Then create a helper method with GetDocument()
to load the HTML of the URL and return the HTML contents:
static HtmlDocument GetDocument(string url)
{
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
return doc;
}
You can use GetDocument()
to retrieve the HTML contents of any target page. The last step is to write the main()
driver code, like this:
var doc = GetDocument("https://scrapeme.live/shop/");
HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2");
HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span");
for (int i = 0; i < names.Count(); i++){
Console.WriteLine("Name: {0}, Price: {1}", names[i].InnerText, prices[i].InnerText);
}
The GetDocument()
function retrieves the HTML contents of the given target. The SelectNodes()
function uses the relevant XPath
selectors to parse the names and prices of the products on the target web page. The for
loop prints all the InnerText
of the parsed elements.
That's it! ScrapeMe has just been crawled using HTML Agility Pack C# web crawling library.

Here's what the full code looks like:
using HAP_Demo;
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
var doc = GetDocument("https://scrapeme.live/shop/");
HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2");
HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span");
List<Products> products = new List<Products>();
for (int i = 0; i < names.Count && i < prices.Count; i++)
{
Products p = new Products();
p.Name= names[i].InnerText;
p.Price = prices[i].InnerText;
products.Add(p);
}
foreach (var p in products)
{
Console.WriteLine("Name: {0} , Price: {1}", p.Name, p.Price);
}
}
static HtmlDocument GetDocument(string url)
{
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
return doc;
}
}
class Products
{
private string name;
private string price;
public string Name { get => name; set => name = value; }
public string Price { get => price; set => price = value; }
}
}
5. Scrapy Sharp
Scrapy Sharp is an open-source C# web crawling library that combines the HTMLAgilityPack
extension with a web client that can emulate a web browser, such as jQuery.
It significantly reduces the setup work often associated with scraping a web page, and its combination with HTMLAgilitypack lets you access the retrieved HTML content easily. Scrapy Sharp can simulate a web browser, therefore it can handle cookie tracking, redirects and other high-level operations.
The downsides of using the Scrapy Sharp C# scraping library is that it requires proxies and anti-bots, and it doesn't support automatic parsing of the crawled content.
How to Scrape a Web Page in C# with Scrapy Sharp
Create a C# Console Application project and install the latest ScrapySharp
package through NuGet Package Manager. Open the Program.cs
file from your Console Application and include the following required libraries.
using HtmlAgilityPack;
using ScrapySharp.Network;
using System;
using System.Linq;
Next, create a static
ScrapingBrowser
object in the Program
class. You can use the functionalities of this object, like mybrowser
, to navigate and crawl the target URLs.
static ScrapingBrowser mybrowser = new ScrapingBrowser();
Create a helper method to retrieve and return the HTML content of the given target URL.
static HtmlNode GetHtml(string url)
{
WebPage webpage = mybrowser.NavigateToPage(new Uri(url));
return webpage.Html;
}
The GetHtml
helper method takes the target URL as a parameter and sends it to the NavigateToPage()
method of the mybrowser
object, returning the webpage's HTML.
Let's create one more helper method to extract and print the product names and the prices:
static void ScrapeNamesAndPrices(string url)
{
var html = GetHtml(url);
var nameNodes = html.OwnerDocument.DocumentNode.SelectNodes("//a/h2");
var priceNodes = html.OwnerDocument.DocumentNode.SelectNodes("//div/main/ul/li/a/span");
foreach (var (n, p) in nameNodes.Zip(priceNodes))
{
Console.WriteLine("Name: {0} , Price: {1}", n.InnerText, p.InnerText);
}
}
This snippet uses the GetHtml()
method to retrieve the URL's HTML and parses it using the SelectNodes()
function. The SelectNodes()
function uses the XPaths of the product names & prices and returns all the elements with the same XPath.
The foreach
loop gets the InnerText
for each element in the nameNode
and priceNode
collections and prints them on the console. For a final touch, add a driver code to put everything in order.
static void Main(string[] args)
{
ScrapeNamesAndPrices("https://scrapeme.live/shop/");
}
And there you have it:

Here's what the final code looks like:
using HtmlAgilityPack;
using ScrapySharp.Network;
using System;
using System.Linq;
namespace ScrapyDemo
{
class Program
{
static ScrapingBrowser mybrowser = new ScrapingBrowser();
static void Main(string[] args)
{
ScrapeNamesAndPrices("https://scrapeme.live/shop/");
}
static HtmlNode GetHtml(string url)
{
WebPage webpage = mybrowser.NavigateToPage(new Uri(url));
return webpage.Html;
}
static void ScrapeNamesAndPrices(string url)
{
var html = GetHtml(url);
var nameNodes = html.OwnerDocument.DocumentNode.SelectNodes("//a/h2");
var priceNodes = html.OwnerDocument.DocumentNode.SelectNodes("//div/main/ul/li/a/span");
foreach (var (n, p) in nameNodes.Zip(priceNodes))
{
Console.WriteLine("Name: {0} , Price: {1}", n.InnerText, p.InnerText);
}
}
}
}
6. Iron Web Scraper
The IronWebscraper is a .Net Core C# web scraping library used to extract and parse data from internet sources. it's capable of controlling permitted and disallowed objects, sites, media and other elements.
Other features include its ability to manage numerous identities and web cache.
- It doesn't support crawling dynamic content.
- Requires manual proxy integrations.
How to Scrape a Web Page in C# with Iron Web Scraper
Setting up the development environment for this C# crawler library is pretty simple. Just install the IronWebScraper
package to your C# Console Project and you're good to go!

The WebScraper
class in IronWebScraper
has two abstract methods: Init
and Parse
. The Init
method initializes the web request and fetches the response. This response is then passed to the Parse
function to extract the required elements from the HTML content.
To create your Iron web scraper, you must inherit the WebScraper
class and override these two abstract methods. If you have IronScraperDemo
as your main scraper class in the Program.cs
file, implement the WebScraper
class like this:
class IronScraperDemo : WebScraper
{
public override void Init()
{
License.LicenseKey = "ENTER YOUR LICENSE KEY HERE";
this.LoggingLevel = WebScraper.LogLevel.All;
this.Request("https://scrapeme.live/shop/", Parse);
}
public override void Parse(Response response)
{
var names = response.Css("h2");
var prices = response.Css("span.price");
for(int i=0; i<names.Length;i++)
{
Console.WriteLine("Name: {0}, Price: {1}", names[i].InnerText, prices[i].InnerText);
}
}
}
The Init
method requires you to add a License Key, which you can get by creating an account on their website. The Init
function further calls the Parse
method on the received response after requesting the target URL.
In our case, the Parse
method uses the CSS Selectors to extract the product names and prices from the response and prints them out on the output console.
Congratulations! Your Iron Web Scraper is ready to work. Just add the following driver code to create an object of your crawler and call its Start()
method.
static void Main(string[] args)
{
IronScraperDemo ironScraper = new IronScraperDemo();
ironScraper.Start();
}
Using the Iron Web Scraper to crawl the target webpage, your output should look like this:

Here's the complete Program.cs
code for this example:
using IronWebScraper;
using System;
using System.Collections.Generic;
namespace IronScrapDemo
{
class IronScraperDemo : WebScraper
{
List<Products> products = new List<Products>();
static void Main(string[] args)
{
IronScraperDemo ironScraper = new IronScraperDemo();
ironScraper.Start();
}
public override void Init()
{
License.LicenseKey = "ENTER YOUR LICENSE KEY HERE";
this.LoggingLevel = WebScraper.LogLevel.All; // All Events Are Logged
this.Request("https://scrapeme.live/shop/", Parse);
}
public override void Parse(Response response)
{
var names = response.Css("h2");
var prices = response.Css("span.price");
for (int i=0; i < names.Length; i++)
{
Products p = new Products();
p.Name = names[i].InnerText;
p.Price = prices[i].InnerText;
products.Add(p);
}
foreach(var p in products)
{
Console.WriteLine("Name: {0}, Price: {1}", p.Name, p.Price);
}
}
}
class Products
{
public String Name
{
get;
set;
}
public String Price
{
get;
set;
}
}
}
Remark: you still need to set up the Development Environment as per the instructions in this section.
7. HttpClient
HttpClient is a C# HTML scraper library that provides async features to extract only the raw HTML contents from a target URL. However, you still need to use an HTML parsing tool to extract the desired data.
How to Scrape a Web Page in C# with HttpClient
The HttpClient is a primitive .NET class and doesn't require any external assembly, but you have to install the HTML Agility Pack as an external dependency through the Package Manager.
To get started, include the following assemblies in the "Program.cs" file under the C# Console Application:
using System;
using System.Threading.Tasks;
using System.Net.Http;
using HtmlAgilityPack;
The System.Threading.Tasks
class helps handle the asynchronous functions. Importing the System.Net.Http
is needed to generate the HTTP requests, and the HtmlAgilityPack
is used to parse the retrieved HTML content.
Add a GetHtmlContent()
method in the Program
class, like this:
private static Task<string> GetHtmlContent()
{
var hclient = new HttpClient();
return hclient.GetStringAsync("https://scrapeme.live/shop");
}
You can pass this string response to the ParseHtml()
method to extract and show the desired data.
private static void ParseHtml(string html)
{
var doc = new HtmlDocument();
doc.LoadHtml(html);
HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2");
HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span");
for (int i = 0; i < names.Count; i++){
Console.WriteLine("Name: {0}, Price: {1}", names[i].InnerText, prices[i].InnerText);
}
}
The above ParseHTML
takes an HTML string as input, then parses it using the SelectNodes()
method and displays the names and prices on the output console. Here the SelectNodes()
function plays the key parsing role. It extracts only the relevant elements from the HTML string according to the given XPath Selectors.
To wrap it off, let's look at the driver code to execute everything in order:
static async Task Main(string[] args)
{
var html = await GetHtmlContent();
ParseHtml(html);
}
Notice that the Main()
function is now an async
method. The reason is that every awaitable
call can only be enclosed in an async
method; thereby, the Main()
function has to be asynchronous.
And like the other C# web scraping libraries discussed in this article, your output will be:

Here's what all the code chunks joined together looks like
using System;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.Net.Http;
using System.Threading.Tasks;
using HtmlAgilityPack;
namespace ConsoleApp1
{
class Program
{
static async Task Main(string[] args)
{
var html = await GetHtmlContent();
List<Products> products = ParseHtml(html);
foreach (var p in products)
{
Console.WriteLine("Name: {0} , Price: {1}", p.Name, p.Price);
}
}
private static Task<string> GetHtmlContent()
{
var hclient = new HttpClient();
return hclient.GetStringAsync("https://scrapeme.live/shop");
}
private static List<Products> ParseHtml(string html)
{
var doc = new HtmlDocument();
doc.LoadHtml(html);
HtmlNodeCollection names = doc.DocumentNode.SelectNodes("//a/h2");
HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//div/main/ul/li/a/span");
List<Products> products = new List<Products>();
for (int i = 0; i < names.Count && i < prices.Count; i++)
{
Products p = new Products();
p.Name = names[i].InnerText;
p.Price = prices[i].InnerText;
products.Add(p);
}
return products;
}
}
class Products
{
public String Name
{
get;
set;
}
public String Price
{
get;
set;
}
}
}
Conclusion
- ZenRows.
- Puppeteer Sharp.
- Selenium.
- HTML Agility Pack.
- Scrapy Sharp
- Iron Web Scraper.
- HttpClient.
A common challenge among scrapers is their inability to crawl a web page without triggering anti-bots. ZenRows solves this scraping problem by handling all anti-bot bypass for you, taking away the headaches involved. You can test ZenRows for free to see what it's capable of.
Frequent Questions
What is the best C# web scraping library?
ZenRows API is the best option for web scraping in C#. Unlike other C# web scraping libraries, it doesn't require manual integration of proxies or setting up anti-bot logic. It further supports scraping dynamic content with AutoParse features.
What is the most popular C# library for web scraping?
HTML Agility Pack is the most popular library, with around 83 million downloads. The main reason for its popularity is its HTML parser, which can download web pages directly or through a browser.
What is a good C# web scraping library?
A good scraping library provides efficient mechanisms to scrape and parse the contents from the targets, avoiding any blockings (including anti-bots/captchas) or revealing your IP. Additionally, it should support scraping dynamic content and should have comprehensive documentation and active community support. Keeping these metrics in view, ZenRows, Puppeteer Sharp, and Selenium are good options to scrape webs using C#.
Which libraries are used for web scraping in C#?
- ZenRows Web Scraper API.
- Puppeteer Sharp.
- Selenium Web Driver.
- HTML Agility Pack.
- Scrapy Sharp.
- Iron Web Scraper.
- HttpClient.
Did you find the content helpful? Spread the word and share it on Twitter, LinkedIn, or Facebook.