Playwright in C# for Web Scraping: Step-by-Step Tutorial

June 24, 2024 · 12 min read

Table of contents

Why use Playwright in C#
Tutorial
- Install Playwright
- Get the source HTML
- Extract data
- Export as CSV
Interact with a browser
- Scrolling
- Wait for element
- Wait for page
- Click elements
Avoid getting blocked
Conclusion

Playwright is a popular browser automation library for both testing and web scraping. The tool is available in many programming languages, including C#. No wonder the Playwright C# package is a favorite tool for headless browser scripting in NET.

In this guide, you'll see the basics of Playwright with C# and then take a look at more complex interactions. You'll learn how to:

Use Playwright in C#.
Interact with web pages in a browser.
Avoid getting blocked.

Let's dive in!

Why Use Playwright in C#

Playwright is a rising star in the browser automation ecosystem. Backed by Microsoft, the library lets you control different browsers through the same API. That opens the door to multi-platform and multi-language testing and web scraping in C#.

C# is one of the languages officially supported by the project. That means the C# Playwright library is always up-to-date and has access to the latest features. For that reason and its rich API, it's one of the best resources for web automation in C#.

Note

Before digging into this Playwright C# tutorial, consider our guides on headless browser scraping and web scraping with C#.

How to Use Playwright in C#

To move the first steps with Playwright in C#, let's target the following infinite scrolling demo page:

This page is a dynamic content page that progressively loads new data as the user scrolls down. Scraping data from it requires a browser automation tool that can run JavaScript and simulate user interaction, such as Playwright.

Follow the steps below and learn how to use Playwright with C# to retrieve data from it!

Step 1: Install Playwright

Make sure the .NET SDK is installed on your machine. Download the .NET installer, launch it, and follow the instructions.

You now have everything you need to set up a Playwright C# project. Open the terminal, create a PlaywrightCSharpProject folder, and enter it:

                    Terminal
                
mkdir PlaywrightCSharpProject
cd PlaywrightCSharpProject

Copied!

Execute the new console command to initialize a C# application:

                    Terminal
                
dotnet new console

Copied!

This will create a Program.cs file and other files inside the project's folder. Program.cs is the entry point of your application. You'll soon update it to contain the Playwright web automation logic.

Then, install Playwright in C# by adding the Microsoft.Playwright package to your project's dependencies:

                    Terminal
                
dotnet add package Microsoft.Playwright

Copied!

Build the project so that Playwright will create the scripts you need to complete the installation:

                    Terminal
                
dotnet build

Copied!

Run the script below to download the required browser executables:

                    Terminal
                
pwsh bin/Debug/net<version>/playwright.ps1 install

Copied!

Replace <version> with your version of .NET, like 7.0, 8.0, 9.0, etc.

Note

That will take a while, so be patient.

Prepare your Progam.cs script by importing Playwright and adding an async Main() function:

                    program.cs
                
using Microsoft.Playwright;

class Program
{
    static async Task Main(string[] args)
    {
        // scraping logic...
    }
}

Copied!

Keep in mind that you can run your C# Playwright script with this command:

                    Terminal
                
dotnet run

Copied!

Well done! You're ready to scrape some data using Playwright with C#.

Step 2: Get the Source HTML With Playwright

Paste the following lines into Main(). These will initialize Playwright, launch a Chromium window, and open a new page:

                    program.cs
                
// initialize a Playwright instance to
// perform browser automation
using var playwright = await Playwright.CreateAsync();

// initialize a Chromium instance
await using var browser = await playwright.Chromium.LaunchAsync(new()
{
    Headless = true, // set to "false" while developing
});
// open a new page within the current browser context
var page = await browser.NewPageAsync();

  
  

  
Copied!

Note

Playwright automatically starts browsers in headless mode. To change that behavior, set the Headless option to false. That's useful to see the actions made by the script on the controlled browser.

Use the GotoAsync() method to visit the target page in the Chromium instance:

                    program.cs
                
await page.GotoAsync("https://scrapingclub.com/exercise/list_infinite_scroll/");

Copied!

Then, call the ContentAsync() method to retrieve the source HTML of the page as a string. Print it in the terminal with Console.WriteLine():

                    program.cs
                
var html = await page.ContentAsync();
Console.WriteLine(html);

Copied!

That's what your Playwright C# example script will contain so far:

                    program.cs
                
using Microsoft.Playwright;

class Program
{
    static async Task Main(string[] args)
    {
        // initialize a Playwright instance to
        // perform browser automation
        using var playwright = await Playwright.CreateAsync();

        // initialize a Chromium instance
        await using var browser = await playwright.Chromium.LaunchAsync(new()
        {
            Headless = true, // set to "false" while developing
        });
        // open a new page within the current browser context
        var page = await browser.NewPageAsync();

        // visit the target page
        await page.GotoAsync("https://scrapingclub.com/exercise/list_infinite_scroll/");

        // retrieve the source HTML code of the page
        // and print it
        var html = await page.ContentAsync();
        Console.WriteLine(html);
    }
}

  
  

  
Copied!

Launch it in headed mode by setting Headless to false. Playwright will open the Chromium window below and visit the Infinite Scrolling page:

Demo Page — Click to open the image in full screen

Before terminating, the script will print the following content in the terminal:

                    Output
                
<html class="h-full"><head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <meta name="description" content="Learn to scrape infinite scrolling pages"><title>Scraping Infinite Scrolling Pages (Ajax) | ScrapingClub</title>
  <link rel="icon" href="/static/img/icon.611132651e39.png" type="image/png">
  <!-- Omitted for brevity... -->

Copied!

Awesome, that's the HTML code of the target page!

See how to scrape data from that page using Playwright in C# as a next step.

Step 3: Extract the Data You Need

One of the most useful C# Playwright's features is the ability to parse the HTML code of a web page. This allows you to extract data from a site, which is what web scraping is all about!

Assume that your goal is to collect the name and price of each product on the page. To achieve that, you need to follow this 3-step procedure:

Select the product HTML elements on the page using a DOM selection strategy.
Retrieve the information of interest from each of them.
Store the collected data in a C# data structure.

A DOM selection strategy usually relies on a CSS Selector or an XPath expression. These are two most popular ways to select HTML elements on a page when it comes to web scraping.

CSS selectors are simple and intuitive, while XPath expressions are more powerful but also harder to define. Find out more in our comparison article on CSS Selector vs XPath.

Let's keep things simple and go for CSS selectors!

The first step in devising an effective CSS selector strategy is to study the HTML code of the page. So, visit the target page in your browser and inspect a product HTML node using the DevTools:

DevTools Inspection — Click to open the image in full screen

Expand the HTML code of the element and notice that you can select all product cards with this CSS selector:

                    program.cs
                
.post

Copied!

Given a product element, you can find:

The product name is then in an inner <h4> element
The product price is in an inner <h5> element.

This information is enough to implement the scraping logic using Playwright with C#. Before doing so, add a global Product class for the data you want to extract from the product elements:

                    program.cs
                
class Product
{
    public string? Name { get; set; }
    public string? Price { get; set; }
}
```

Copied!

Then, initialize an empty list of type Product in Main(). At the end of the script, this list will store all scraped data:

                    program.cs
                
var products = new List<Product>();

Copied!

Now, learn how to scrape the name and price of each product on the target page!

Use the Locator() method to get all product HTML elements on the page. Thanks to the css=<css_selector> syntax, you can instruct C# Playwright to apply a CSS selector on the DOM:

                    program.cs
                
var productHTMLElements = page.Locator("css=.post");

Copied!

Iterate over the product nodes and apply the data extraction logic:

                    program.cs
                
for (var index = 0; index < await productHTMLElements.CountAsync(); index++)
{
    // get the current product HTML element
    var productHTMLElement = productHTMLElements.Nth(index);

    // retrieve the name and price
    var name = (await productHTMLElement.Locator("h4").TextContentAsync())?.Trim();
    var price = (await productHTMLElement.Locator("h5").TextContentAsync())?.Trim();

    // create a new Product instance and
    // add it to the list
    var product = new Product { Name = name, Price = price };
    products.Add(product);
}

  
  

  
Copied!

TextContentAsync() returns the text of the current element. This may contain whitespace characters, so remove them with Trim().

Verify that products contains the data scraped with:

                    program.cs
                
foreach (var product in products)
{
    Console.WriteLine($"Name: {product.Name}, Price: {product.Price}");
};

Copied!

Integrate the above logic into Program.cs, and you'll get:

                    program.cs
                
using Microsoft.Playwright;

class Program
{
    // a custom class matching data to scrape
    // from the product HTML elements
    class Product
    {
        public string? Name { get; set; }
        public string? Price { get; set; }
    }

    static async Task Main(string[] args)
    {
        // initialize a Playwright instance to
        // perform browser automation
        using var playwright = await Playwright.CreateAsync();

        // initialize a Chromium instance
        await using var browser = await playwright.Chromium.LaunchAsync(new()
        {
            Headless = true, // set to "false" while developing
        });
        // open a new page within the current browser context
        var page = await browser.NewPageAsync();

        // visit the target page
        await page.GotoAsync("https://scrapingclub.com/exercise/list_infinite_scroll/");

        // where to store the scraped data
        var products = new List<Product>();

        // select all product HTML elements on the page
        var productHTMLElements = page.Locator("css=.post");

        // iterate over the product elements
        // and apply the scraping logic
        for (var index = 0; index < await productHTMLElements.CountAsync(); index++)
        {
            // get the current product HTML element
            var productHTMLElement = productHTMLElements.Nth(index);

            // retrieve the name and price
            var name = (await productHTMLElement.Locator("h4").TextContentAsync())?.Trim();
            var price = (await productHTMLElement.Locator("h5").TextContentAsync())?.Trim();

            // create a new Product instance and
            // add it to the list
            var product = new Product { Name = name, Price = price };
            products.Add(product);
        }

        // print the scraped data
        foreach (var product in products)
        {
            Console.WriteLine($"Name: {product.Name}, Price: {product.Price}");
        };
    }
} 

  
  

  
Copied!

Run the Playwright C# script, and it'll produce the following output:

                    Output
                
Name: Short Dress, Price: $24.99
Name: Patterned Slacks, Price: $29.99
// omitted for brevity...
Name: Short Lace Dress, Price: $59.99
Name: Fitted Dress, Price: $34.99

Copied!

Fantastic! Your C# Playwright scraping logic works like a charm. It only remains to export the retrieved data to a human-readable format.

Step 4: Export Data as CSV

C# comes with a complete I/O API to write data to a file. However, the easiest way to export data to CSV is through an external library. In particular, CsvHelper is the most popular .NET package for reading and writing CSV files.

Add the CsvHelper NuGet package to your project's dependencies:

                    Terminal
                
dotnet add package CsvHelper

Copied!

Import it by adding this line to the top of your Program.cs script:

                    program.cs
                
using CsvHelper;

Copied!

Create a products.csv file and populate it using a CsvWriter object from CsvHelper. The WriteRecords() method will convert the product objects to CSV format and add them to the file:

                    program.cs
                
// create the CSV output file
using (var writer = new StreamWriter("products.csv"))
// instantiate the CSV writer
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
    // populate the CSV file
    csv.WriteRecords(products);
}

  
  

  
Copied!

CultureInfo.invariantCulture guarantees that any software can read the produced CSV. Import CultureInfo for the CSV export logic to work:

                    program.cs
                
using System.Globalization;

Copied!

Put it all together, and you'll get the following scraper:

                    program.cs
                
using Microsoft.Playwright;
using CsvHelper;
using System.Globalization;

class Program
{
    // a custom class matching data to scrape
    // from the product HTML elements
    class Product
    {
        public string? Name { get; set; }
        public string? Price { get; set; }
    }

    static async Task Main(string[] args)
    {
        // initialize a Playwright instance to
        // perform browser automation
        using var playwright = await Playwright.CreateAsync();

        // initialize a Chromium instance
        await using var browser = await playwright.Chromium.LaunchAsync(new()
        {
            Headless = true, // set to "false" while developing
        });
        // open a new page within the current browser context
        var page = await browser.NewPageAsync();

        // visit the target page
        await page.GotoAsync("https://scrapingclub.com/exercise/list_infinite_scroll/");

        // where to store the scraped data
        var products = new List<Product>();

        // select all product HTML elements on the page
        var productHTMLElements = page.Locator("css=.post");

        // iterate over the product elements
        // and apply the scraping logic
        for (var index = 0; index < await productHTMLElements.CountAsync(); index++)
        {
            // get the current product HTML element
            var productHTMLElement = productHTMLElements.Nth(index);

            // retrieve the name and price
            var name = (await productHTMLElement.Locator("h4").TextContentAsync())?.Trim();
            var price = (await productHTMLElement.Locator("h5").TextContentAsync())?.Trim();

            // create a new Product instance and
            // add it to the list
            var product = new Product { Name = name, Price = price };
            products.Add(product);
        }

        // create the CSV output file
        using (var writer = new StreamWriter("products.csv"))
        // instantiate the CSV writer
        using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
        {
            // populate the CSV file
            csv.WriteRecords(products);
        }
    }
}
```

  
  

  
Copied!

Run the C# Playwright script:

                    Terminal
                
dotnet run

Copied!

When the script execution ends, a products.csv file will appear in the project's folder. Open it, and you'll see this data:

Brilliant! You now know the basics of Playwright with C#.

At the same time, the current output isn't complete. The CSV file contains only ten records, even though the target page has many more products than that. It initially shows a few products but uses infinite scrolling to load more. Follow the next section and learn how to tackle that!

How to Interact With a Browser in C# Playwright

Playwright can simulate several interactions, such as scrolls, waits, mouse movements, and more. That's the key to programmatic interaction with dynamic content pages. Browser automation is great for simulating human behavior, helping you bypass anti-bot measures.

The interactions supported by Playwright with C# include:

Click on elements and hover them.
Move the mouse and drag and drop nodes.
Wait for elements on the page to be present, visible, clickable, etc.
Fill out and empty input fields.
Scroll up and down the page.
Submit forms.
Take screenshots.

You can perform most of these operations through built-in methods. In all other scenarios, use EvaluateAsync() to run a JavaScript script directly on the page. With either approach, you can mimic any user interaction in the controlled browser.

Let's learn how to scrape all product data from the infinite scroll demo page. Then, you'll explore other interactions in dedicated Playwright C# examples!

Scrolling

The page returned by the server only contains ten product cards. Users can apply the infinite scrolling interaction to load more products. To scrape all products on the site, you must simulate those scroll actions.

Playwright doesn't come with a built-in method for scrolling down the page. So, you need to write custom JavaScript logic as in the snippet below. This instructs the browser to scroll down the page 10 times at an interval of 500 ms:

                    program.cs
                
// scroll down the page 10 times
const scrolls = 10
let scrollCount = 0

// scroll down and then wait for 0.5s
const scrollInterval = setInterval(() => {
  window.scrollTo(0, document.body.scrollHeight)
  scrollCount++

  if (scrollCount === numScrolls) {
    clearInterval(scrollInterval)
  }
}, 500)

  
  

  
Copied!

Store that script in a multi-line string variable. Then, pass it to the EvaluateAsync() method to simulate infinite scrolling:

                    program.cs
                
var jsScrollScript = @"
    const scrolls = 10
    let scrollCount = 0

    // scroll down and then wait for 0.5s
    const scrollInterval = setInterval(() => {
    window.scrollTo(0, document.body.scrollHeight)
    scrollCount++

    if (scrollCount === numScrolls) {
        clearInterval(scrollInterval)
    }
    }, 500)
";
await page.EvaluateAsync(jsScrollScript);

  
  

  
Copied!

Place these instructions before the locator logic. That's because you must let the browser load new products before selecting them.

Executing the JavaScript script on the page won't be enough. You also need to wait for the scrolling logic to end and for the new products to be on the page. To do so, use WaitForTimeoutAsync() to stop the script execution for 10 seconds:

                    program.cs
                
await page.WaitForTimeoutAsync(10000);

Copied!

Here's what your Program.cs file will look like:

                    program.cs
                
using Microsoft.Playwright;
using CsvHelper;
using System.Globalization;

class Program
{
    // a custom class matching data to scrape
    // from the product HTML elements
    class Product
    {
        public string? Name { get; set; }
        public string? Price { get; set; }
    }

    static async Task Main(string[] args)
    {
        // initialize a Playwright instance to
        // perform browser automation
        using var playwright = await Playwright.CreateAsync();

        // initialize a Chromium instance
        await using var browser = await playwright.Chromium.LaunchAsync(new()
        {
            Headless = true, // set to "false" while developing
        });
        // open a new page within the current browser context
        var page = await browser.NewPageAsync();

        // visit the target page
        await page.GotoAsync("https://scrapingclub.com/exercise/list_infinite_scroll/");

        // where to store the scraped data
        var products = new List<Product>();

        // scrolling logic in JavaScript
        var jsScrollScript = @"
            const scrolls = 10
            let scrollCount = 0

            // scroll down and then wait for 0.5s
            const scrollInterval = setInterval(() => {
            window.scrollTo(0, document.body.scrollHeight)
            scrollCount++

            if (scrollCount === numScrolls) {
                clearInterval(scrollInterval)
            }
            }, 500)
        ";
        // execute the JS scrolling script on the page
        await page.EvaluateAsync(jsScrollScript);

        // wait for 10 seconds for the product elements
        // to be loaded on the page
        await page.WaitForTimeoutAsync(10000);

        // select all product HTML elements on the page
        var productHTMLElements = page.Locator("css=.post");

        // iterate over the product elements
        // and apply the scraping logic
        for (var index = 0; index < await productHTMLElements.CountAsync(); index++)
        {
            // get the current product HTML element
            var productHTMLElement = productHTMLElements.Nth(index);

            // retrieve the name and price
            var name = (await productHTMLElement.Locator("h4").TextContentAsync())?.Trim();
            var price = (await productHTMLElement.Locator("h5").TextContentAsync())?.Trim();

            // create a new Product instance and
            // add it to the list
            var product = new Product { Name = name, Price = price };
            products.Add(product);
        }

        // create the CSV output file
        using (var writer = new StreamWriter("products.csv"))
        // instantiate the CSV writer
        using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
        {
            // populate the CSV file
            csv.WriteRecords(products);
        }
    }
}

  
  

  
Copied!

Execute the script again:

                    Terminal
                
dotnet run

Copied!

Be patient as the script will now be much slower due to the 10-second interruption.

products should now contain much more than just 10 elements. Verify that by opening the products.csv file produced by the Playwright C# scraper. That's what you'll see:

Updated Products CSV File — Click to open the image in full screen

Congratulations! You just scraped all products on the target page.

Wait for Element

The current solution produces the desired results, sure. Yet, the current C# Playwright example script relies on WaitForTimeoutAsync(). The use of that method is discouraged even in the official documentation.

The reason is quite simple. What happens in case of a browser or network slowdown? 10 seconds may not be enough for all products to be loaded on the page! In general, hard waits make your browser automation logic flaky. That's why you should never use them in real-world scripts.

Waiting for a fixed number of seconds also makes your scraping logic slow. To avoid these issues, Playwright in C# automatically waits for several conditions before performing an action on a node. For example, it waits for the DOM element to be on the page and in the ready state.

Use the ToBeVisibleAsync() assertion to wait up to 10 seconds for the 60th .post element to be on the DOM. By default, Playwright applies a 5-second timeout to each assertion. You can override that with the Timeout option as below:

                    program.cs
                
await Assertions.Expect(page.Locator("css=.post:nth-child(60)"))
                .ToBeVisibleAsync(new() { Timeout = 10000 });

Copied!

Replace the WaitForTimeoutAsync() call with the instruction above. The script will now wait for the page to render all the 60 products retrieved via AJAX after the scrolls.

The definitive code of your script using Playwright with C# will be:

                    program.cs
                
using Microsoft.Playwright;
using CsvHelper;
using System.Globalization;

class Program
{
    // a custom class matching data to scrape
    // from the product HTML elements
    class Product
    {
        public string? Name { get; set; }
        public string? Price { get; set; }
    }

    static async Task Main(string[] args)
    {
        // initialize a Playwright instance to
        // perform browser automation
        using var playwright = await Playwright.CreateAsync();

        // initialize a Chromium instance
        await using var browser = await playwright.Chromium.LaunchAsync(new()
        {
            Headless = true, // set to "false" while developing
        });
        // open a new page within the current browser context
        var page = await browser.NewPageAsync();

        // visit the target page
        await page.GotoAsync("https://scrapingclub.com/exercise/list_infinite_scroll/");

        // where to store the scraped data
        var products = new List<Product>();

        // scrolling logic in JavaScript
        var jsScrollScript = @"
            const scrolls = 10
            let scrollCount = 0

            // scroll down and then wait for 0.5s
            const scrollInterval = setInterval(() => {
            window.scrollTo(0, document.body.scrollHeight)
            scrollCount++

            if (scrollCount === numScrolls) {
                clearInterval(scrollInterval)
            }
            }, 500)
        ";
        // execute the JS scrolling script on the page
        await page.EvaluateAsync(jsScrollScript);

        // wait up to 10 seconds for the 60th product
        // to be on the page
        await Assertions.Expect(page.Locator("css=.post:nth-child(60)"))
                        .ToBeVisibleAsync(new() { Timeout = 10000 });

        // select all product HTML elements on the page
        var productHTMLElements = page.Locator("css=.post");

        // iterate over the product elements
        // and apply the scraping logic
        for (var index = 0; index < await productHTMLElements.CountAsync(); index++)
        {
            // get the current product HTML element
            var productHTMLElement = productHTMLElements.Nth(index);

            // retrieve the name and price
            var name = (await productHTMLElement.Locator("h4").TextContentAsync())?.Trim();
            var price = (await productHTMLElement.Locator("h5").TextContentAsync())?.Trim();

            // create a new Product instance and
            // add it to the list
            var product = new Product { Name = name, Price = price };
            products.Add(product);
        }

        // create the CSV output file
        using (var writer = new StreamWriter("products.csv"))
        // instantiate the CSV writer
        using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
        {
            // populate the CSV file
            csv.WriteRecords(products);
        }
    }
}

  
  

  
Copied!

Execute Program.cs again, and you'll get the same results as before but much faster. That's because the script no longer has to wait 10 seconds.

Et voilà! Now that you're an expert in web scraping using Playwright in C#, you're ready to explore other interactions.

Wait for Page to Load

By default, GotoAsync() waits up to 30 seconds for the page to load. Basically, the C# Playwright library already waits for pages to load for you.

The problem is that modern web pages are so dynamic that it's hard to tell when a page has fully loaded. For more complex scenarios, consider the following auto-waiting assertions:

ToBeAttachedAsync(): Verify if a specific element is attached to the DOM.
ToBeCheckedAsync(): Check if a checkbox HTML element is checked.
ToBeDisabledAsync(): Verify if a specific element is disabled.
ToBeEditableAsync(): Check if an element is editable.
ToBeEmptyAsync(): Verify if a container is empty.
ToBeEnabledAsync(): Verify if a specific element is enabled.
ToBeFocusedAsync(): Verify if a specific element is focused.
ToBeHiddenAsync(): Verify if a specific element isn't visible.
ToBeInViewportAsync(): Check if an element intersects the viewport.
ToBeVisibleAsync(): Verify if a specific element is visible.
ToContainTextAsync(): Verify if an element contains specific text.
ToHaveAttributeAsync(): Verify if an element has a specific DOM attribute.
ToHaveClassAsync(): Verify if an element has a specific class property.
ToHaveCountAsync(): Verify if a list has an exact number of children.
ToHaveCSSAsync(): Verify if an element has a specific CSS property.
ToHaveIdAsync(): Verify if an element has a specific ID.
ToHaveJSPropertyAsync(): Verify if an element has a specific JavaScript property.
ToHaveTextAsync(): Verify if an element matches specific text.
ToHaveValueAsync(): Verify if an input element has a specific value.
ToHaveValuesAsync(): Verify if a select element has specific options selected.
ToHaveTitleAsync(): Verify if a page has a specific title.
ToHaveURLAsync(): Verify if a page has a specific URL.
ToBeOKAsync(): Verify if a response has a 2xx status.

For more information on how to wait in Playwright C#, check out the documentation.

Click Elements

Playwright locators expose the ClickAsync() method for simulating click interactions. This function instructs the browser to click on the specified node as a human user would:

                    program.cs
                
await locator.ClickAsync();

Copied!

If the click action triggers a page change (as in the example below), you'll get redirected to a new page. In this case, keep in mind that you'll have a new DOM structure to deal with:

                    program.cs
                
var productHTMLElement = page.Locator("css=.post");
await productHTMLElement.ClickAsync();
// you are now on the detail product page...
    
// new scraping logic...
// page.Locator(...);

Copied!

Avoid Getting Blocked When Scraping With Playwright

The biggest challenge to web scraping is getting blocked by anti-bot solutions. These measures can identify and stop automated scripts, such as your Playwright C# scraper. Sites adopt them as they want how value data is, and they don't want to give it up for free.

Performing web scraping without getting blocked isn't easy, but it's possible. An effective approach to bypassing most simple anti-bots is to randomize your requests. To do so, use proxies to change the exit IP and set real-world User-Agent header values.

To set a custom user agent in Playwright with C#, pass it to NewContextAsync(). Then, use the customized browser content to open a new page:

                    program.cs
                
var customUserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36";
// set a custom user agent header in the browser context
var context = await browser.NewContextAsync(new() { UserAgent = customUserAgent });
var page = await context.NewPageAsync();

Copied!

Learn more about why this is useful for web scraping in our guide on User Agents for web scraping.

Configure an HTTP proxy in Playwright by passing a Proxy object to LaunchAsync(). Get the connection info of a free proxy from sites like Free Proxy List. Instantiate a Proxy object and then set it in the controlled browser:

                    program.cs
                
var proxy = new Proxy
{
    Server = "http://213.33.2.28:8076",
};
// set a custom proxy in the controlled browser
await using var browser = await BrowserType.LaunchAsync(new()
{
    Proxy = proxy
});

  
  

  
Copied!

All browser requests will be routed through the specific proxy. The problem is that free proxies are data-greedy, short-lived, and unreliable. By the time you read this article, the proxy server above will no longer work.

Those two approaches are just baby steps to elude anti-bot technologies. Sophisticated tools like Cloudflare, DataDome, PerimeterX and other WAF systems will still be able to detect your script as a bot with no effort:

Blocked G2 Page — Click to open the image in full screen

Time to give up? Hell, no! You just need the right tool and its name is ZenRows. As a next-generation scraping API, it provides the most powerful bot bypass toolkit and User-Agent and IP rotation capabilities.

Note

ZenRows also has JavaScript rendering and user interaction capabilities. That makes it a more powerful replacement of Playwright you can use to never get blocked again.

Try ZenRows. Sign up for free and you'll get to the Request Builder page below:

building a scraper with zenrows — Click to open the image in full screen

Suppose you want to extract data from the protected G2.com page seen earlier.

Paste the target URL (https://www.g2.com/products/airtable/reviews) into the "URL to Scrape" input.
Enable the “JS Rendering” mode.
Click on "Premium Proxy" to enable IP rotation (User-Agent rotation and the AI-powered anti-bot toolkit are included by default).
Select the “C#” option on the right and then the “API” mode to get the snippet required to use ZenRows in C#.

You'll get this code:

                    program.cs
                
using RestSharp;

namespace TestApplication {
    class Test {
        static void Main(string[] args) {
            var client = new RestClient("https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fairtable%2Freviews&js_render=true&premium_proxy=true");
            var request = new RestRequest();

            var response = client.Get(request);
            Console.WriteLine(response.Content);
        }
    }
}

  
  

  
Copied!

Install the RestSharp HTTP client library:

                    Terminal
                
dotnet add package RestSharp

Copied!

Next, launch the script. It'll print the source HTML of the target Cloudflare-protected page:

                    Output
                
<!DOCTYPE html>
<head>
  <meta charset="utf-8" />
  <link href="https://www.g2.com/assets/favicon-fdacc4208a68e8ae57a80bf869d155829f2400fa7dd128b9c9e60f07795c4915.ico" rel="shortcut icon" type="image/x-icon" />
  <title>Airtable Reviews 2024: Details, Pricing, &amp; Features | G2</title>
  <!-- omitted for brevity ... -->

Copied!

Wow! Bye-bye CAPTCHAS and 403 error pages. You just integrated ZenRows into Playwright with C#.

As a SaaS solution, ZenRows also introduces significant savings in terms of machine costs!

Conclusion

In this Playwright C# tutorial, you saw the fundamentals of browser automation in .NET. You learned the basics of controlling headless Chromium and then dug into more advanced techniques. You've become a C# Playwright expert!

Now you know:

How to set up a project based on Playwright with C#.
How to use it to extract data from a dynamic content page.
What user interactions you can simulate in Playwright.
The challenges of scraping online data and how to address them.

No matter how complex your browser automation script is, anti-bots can still block it. Bypass them all with ZenRows, a web scraping API with browser automation functionality, IP rotation, and the most powerful anti-scraping toolkit. Scraping has never been easier. Try ZenRows for free!

Why Use Playwright in C#

How to Use Playwright in C#

Step 1: Install Playwright

Step 2: Get the Source HTML With Playwright

Step 3: Extract the Data You Need

Step 4: Export Data as CSV

How to Interact With a Browser in C# Playwright

Scrolling

Wait for Element

Wait for Page to Load

Click Elements

Avoid Getting Blocked When Scraping With Playwright

Conclusion

Ready to get started?