The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ๐Ÿ˜Ž

Concurrency in C#: Step-by-Step Tutorial 2024

November 13, 2023 ยท 11 min read

Concurrency in C# is performing operations in parallel to save time. While powerful, it's also complex, so let's see how to do it step by step in this tutorial.

What Is Concurrency in C#?

Concurrency in C# refers to the ability of an application to run more than one operation at the same time, avoiding the block of the main thread. It allows multiple tasks to progress independently, leading to better performance.

๐Ÿ‘ Pros of concurrency:

  • Improved performance, with significantly reduced execution time.
  • Reduced idle time.

๐Ÿ‘Ž Cons of concurrency:

  • Increments resource use, which can lead to system overloads.
  • Developing, debugging, and maintaining concurrent code is more challenging.
  • Introduces overhead.
  • Race conditions can lead to unpredictable behavior.
  • Not all methods from the C# standard API are thread-safe.

When to Implement Concurrency

Consider C# concurrency every time your application needs to handle many I/O operations. You can generally run file I/O actions, network requests, and database queries in parallel to improve performance.

Another scenario where concurrency is beneficial is in applications involving CPU-bound operations, such as heavy computations or data processing. Here, the idea is to divide the load into smaller units and execute them concurrently.

How to Use Concurrency in C#

Let's start with a sequential script and make it work in parallel to learn how to use concurrency in C#.

Step 1: Start with a Non-concurrent Script

Suppose you have the following C# script, which visits 5 web pages sequentially:

scraper.cs

public class Program
{
  public static void Main()
  {
    // URLs of the pages to visit
    var pageURLs = new List<string> {
                "https://scrapeme.live/shop/page/1/",
                "https://scrapeme.live/shop/page/2/",
                "https://scrapeme.live/shop/page/3/",
                "https://scrapeme.live/shop/page/4/",
                "https://scrapeme.live/shop/page/5/"
            };

    // initialize the common HTTP client to make
    // all the requests
    HttpClient client = new HttpClient();

    // perform the requests sequentially
    foreach (var pageURL in pageURLs)
    {
      var response = client.GetAsync(pageURL).Result;
      Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
    }

    // dispose the HTTP client
    client.Dispose();
  }
}

Run it:

Terminal
dotnet run

It'll produce the following output. The C# program executes each HTTP request to the specified URL in sequence and in the expected order.

Output
Request to 'https://scrapeme.live/shop/page/1/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/2/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/3/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/4/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/5/' completed with status code 'OK'!

Now, add the following logic to measure the code execution time:

scraper.cs
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();

// operation to measure the time

stopwatch.Stop();
double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
Console.WriteLine($"Elapsed time: {elapsedTimeS}s");

Stopwatch is a C# standard class that exposes methods and properties to accurately measure elapsed time. It comes from System.Diagnostics, so you'll also need the import below:

scraper.cs
using System.Diagnostics;

Put it all together:

scraper.cs
using System.Diagnostics;

public class Program
{
  public static void Main()
  {
    // to measure the time required by the script
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();

    // URLs of the pages to visit
    var pageURLs = new List<string> {
                "https://scrapeme.live/shop/page/1/",
                "https://scrapeme.live/shop/page/2/",
                "https://scrapeme.live/shop/page/3/",
                "https://scrapeme.live/shop/page/4/",
                "https://scrapeme.live/shop/page/5/"
            };

    // initialize the common HTTP client to make
    // all the requests
    HttpClient client = new HttpClient();

    // perform the requests sequentially
    foreach (var pageURL in pageURLs)
    {
      var response = client.GetAsync(pageURL).Result;
      Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
    }

    // dispose the HTTP client
    client.Dispose();

    // get the elapsed time in seconds
    stopwatch.Stop();
    double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
    Console.WriteLine($"Elapsed time: {elapsedTimeS}s");
  }
}

Launch the script, and you'll notice this extra message:

Output
...
Elapsed time: 4.721s

Great! That means each request takes around one second because the scraper visited five pages.

But what if you want the script to make the requests in parallel? Read on to learn how!

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Step 2: Create Threads for Concurrent Requests

You now want each HTTP request to get executed by an independent thread. The first step is to isolate the request execution logic into a function:

scraper.cs
private static void ProcessRequest(HttpClient client, string pageURL)
{
  var response = client.GetAsync(pageURL).Result;
  Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
}

Then, you need to create a list of threads. Each one will execute an HTTP request through the ProcessRequest() function.

scraper.cs
List<Thread> threads = new List<Thread>();

In C#, the constructor for the Thread class accepts a function that represents the task to be executed on the thread. Iterate over each page URL, create a new Thread, and add it to the list:

scraper.cs
foreach (var pageURL in pageURLs)
{
  Thread thread = new Thread(() =>
        {
          ProcessRequest(client, pageURL);
        });
  threads.Add(thread);
}

Awesome! The list threads now contains a list of processes ready to be run.

This is the current code of your new C# concurrency script:

scraper.cs
using System.Diagnostics;

public class Program
{
  public static void Main()
  {
    // to measure the time required by the script
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();

    // URLs of the pages to visit
    var pageURLs = new List<string> {
                "https://scrapeme.live/shop/page/1/",
                "https://scrapeme.live/shop/page/2/",
                "https://scrapeme.live/shop/page/3/",
                "https://scrapeme.live/shop/page/4/",
                "https://scrapeme.live/shop/page/5/"
            };

    // initialize the common HTTP client to make
    // all the requests
    HttpClient client = new HttpClient();

    // initialize the list of threads
    List<Thread> threads = new List<Thread>();

    // define each thread and add it to the list
    foreach (var pageURL in pageURLs)
    {
      Thread thread = new Thread(() =>
            {
              ProcessRequest(client, pageURL);
            });
      threads.Add(thread);
    }

    // launch the threads...

    // dispose the HTTP client
    client.Dispose();

    // get the elapsed time in seconds
    stopwatch.Stop();
    double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
    Console.WriteLine($"Elapsed time: {elapsedTimeS}s");
  }

  private static void ProcessRequest(HttpClient client, string pageURL)
  {
    var response = client.GetAsync(pageURL).Result;
    Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
  }
}

Well done! It's time to fire the threads you just defined.

Step 3: Start All Threads

To start a Thread, call its Start() method. That instructs the OS to change the state of the current thread instance to "Running".

Iterate over each thread in the threads list and launch them all:

scraper.cs
foreach (var thread in threads)
{
  thread.Start();
}

When you call thread.Start(), this is what happens behind the scenes:

  1. Thread creation: The operating system allocates a new thread to its thread pool.
  2. Context switch: The operating system performs a context switch to the newly created thread. That means the CPU switches its focus from the currently executing thread to the new thread. The state of the new thread is loaded into the CPU registers, and its execution begins.
  3. Thread logic execution: The CPU starts executing the logic defined in the function referenced by the thread object.

Those steps occur for each thread, leading to parallel execution of the functions passed to the threads. In other words, the script will perform the five HTTP requests in parallel.

Perfect! Each thread is now running.

Keep in mind that C# threads are individual units of execution that run concurrently with the main application thread. When a new thread starts, it operates independently from the main thread. Thus, the main thread doesn't automatically wait for it to complete.

This independence is what enables parallelism. It also implies that you must explicitly wait for a each thread to finish its execution. Learn how in the next step!

Step 4: Wait for Threads To Finish

The Join() method instructs the C# program to block the execution until the thread instance terminates. Call it on each thread to wait for them to complete:

scraper.cs
foreach (var thread in threads)
{
  thread.Join();
}

After calling Join(), the OS keeps executing the thread until it reaches the end of its function or raises an exception. Once the thread terminates, the OS releases its resources and the C# script can continue.

In this case, the C# program will wait for all threads to end before moving on to the next line of code.

Congrats! You just implemented concurrency in C#. It's time to try the script.

Step 5: Launch the C# Concurrency Script

This is the complete thread-based C# script:

scraper.cs
using System.Diagnostics;

public class Program
{
  public static void Main()
  {
    // to measure the time required by the script
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();

    // URLs of the pages to visit
    var pageURLs = new List<string> {
                "https://scrapeme.live/shop/page/1/",
                "https://scrapeme.live/shop/page/2/",
                "https://scrapeme.live/shop/page/3/",
                "https://scrapeme.live/shop/page/4/",
                "https://scrapeme.live/shop/page/5/"
            };

    // initialize the common HTTP client to make
    // all the requests
    HttpClient client = new HttpClient();

    // initialize the list of threads
    List<Thread> threads = new List<Thread>();

    // define each thread and add it to the list
    foreach (var pageURL in pageURLs)
    {
      Thread thread = new Thread(() =>
            {
              ProcessRequest(client, pageURL);
            });
      threads.Add(thread);
    }

    // launch each thread
    foreach (var thread in threads)
    {
      thread.Start();
    }

    // wait for all threads to complete
    foreach (var thread in threads)
    {
      thread.Join();
    }

    // dispose the HTTP client
    client.Dispose();

    // get the elapsed time in seconds
    stopwatch.Stop();
    double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
    Console.WriteLine($"Elapsed time: {elapsedTimeS}s");
  }

  private static void ProcessRequest(HttpClient client, string pageURL)
  {
    var response = client.GetAsync(pageURL).Result;
    Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
  }
}

Launch it, and you'll get an output similar to this:

Output
Request to 'https://scrapeme.live/shop/page/3/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/2/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/4/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/5/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/1/' completed with status code 'OK'!
Elapsed time: 1.676s

The order of the requests is no longer sequential, so the script runs in parallel.

Each time you run the script, you'll get a different order because it depends on which thread terminates first.

The total execution time is slightly longer than 1 second, which makes sense as each request takes about 1 second. The requests are now made in parallel, and the execution time is:

Example
Time to execute the slowest request + Time to handle threads 

Don't forget that creating and controlling threads comes at a cost in terms of time and resources. Using them is beneficial only when the time saved is greater than the overhead introduced. Here, we got a ~3x time improvement, which more than justifies thread use!

Advanced C# Thread Handling With ThreadPool

The optimal number of threads to use depends on the CPU cores available, task types, and other factors. As you can imagine, determining it is complex, but manually opening threads without taking them into account can lead to system overload.

The solution is a thread pool, which is a managed collection of threads optimized for short-running tasks. It creates a specific number of threads upfront for you, and then it queues your tasks and efficiently reuses threads upon task completion.

That approach minimizes the overhead of creating and destroying threads for each task, leading to better performance and resource use.

The easiest way to deal with a thread pool in C# is through the ThreadPool class. Its QueueUserWorkItem() static method queues a function for execution with one of the threads in the pool:

scraper.cs
foreach (var pageURL in pageURLs)
{
    ThreadPool.QueueUserWorkItem(_ =>
    {
        ProcessRequest(client, pageURL);
    });
}

The default size of the thread pool depends on several factors, such as how large the virtual address space is. Call the GetMaxThreads() static method to determine the number of threads in the pool. To change it, use SetMaxThreads().

The main issue with ThreadPool is that it doesn't provide a method to wait for thread execution. As a workaround, you can use a CountdownEvent object as below. You see a synchronization primitive that is signaled when its internal count reaches zero.

scraper.cs
CountdownEvent countdownEvent = new CountdownEvent(pageURLs.Count);

foreach (var pageURL in pageURLs)
{
  ThreadPool.QueueUserWorkItem(_ =>
  {
    ProcessRequest(client, pageURL);
    // signal that this task is completed
    countdownEvent.Signal();
  });
}

// wait for all threads to terminate
countdownEvent.Wait();

Et voilร ! You're now a C# concurrency thread master!

Use Tasks instead of Threads

The task-based asynchronous programming pattern (TAP) is another way to implement concurrency in C#. Instead of using manual threads, it allows you to perform asynchronous operations in tasks.

In C#, a Task is the core concept of the TAP and represents an asynchronous operation. It accepts a function that represents the asynchronous logic to perform. Under the hood, C# executes tasks asynchronously on the thread pool.

The benefits of using tasks over threads in C# are:

  • Higher abstraction: Tasks work on top of threads and provide a higher-level abstraction for managing async operations.
  • Improved readability: Asynchronous logic is more readable and easier to understand than concurrent code involving threads.
  • Operation chaining: Tasks make it easier to chain operations and specify what should happen next when a task terminates.

Let's now see how to use Tasks to build a concurrent script in C#!

Step 1: Define and Start the Tasks

As you did before with threads, the first step is to isolate the task logic in a new function:

scraper.cs
private static void ProcessRequest(string pageURL, HttpClient client)
{
  var response = client.GetAsync(pageURL).Result;
  Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
}

Create a list of Task objects, iterate over it, and populate it with new tasks. Task.Run() is a static method that transforms a function into a Task and queues it on the thread pool for execution. If a thread in the pool is empty, the task will be run immediately:

scraper.cs
 // initialize the list of tasks
List<Task> tasks = new List<Task>();

// define each task and add it to the list
foreach (var pageURL in pageURLs)
{
  tasks.Add(Task.Run(() => ProcessRequest(pageURL, client)));
} 

Your current script will be:

scraper.cs
using System.Diagnostics;

public class Program
{
  public static void Main()
  {
    // to measure the time required by the script
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();

    // URLs of the pages to visit
    var pageURLs = new List<string> {
                "https://scrapeme.live/shop/page/1/",
                "https://scrapeme.live/shop/page/2/",
                "https://scrapeme.live/shop/page/3/",
                "https://scrapeme.live/shop/page/4/",
                "https://scrapeme.live/shop/page/5/"
            };

    // initialize the common HTTP client to make
    // all the requests
    HttpClient client = new HttpClient();

    // initialize the list of tasks
    List<Task> tasks = new List<Task>();

    // define each task and add it to the list
    foreach (var pageURL in pageURLs)
    {
      tasks.Add(Task.Run(() => ProcessRequest(pageURL, client)));
    }

    // wait for tasks to complete...

    // dispose the HTTP client
    client.Dispose();

    // get the elapsed time in seconds
    stopwatch.Stop();
    double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
    Console.WriteLine($"Elapsed time: {elapsedTimeS}s");
  }

  private static void ProcessRequest(string pageURL, HttpClient client)
  {
    var response = client.GetAsync(pageURL).Result;
    Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
  }
}

Wonderful! It only remains to wait for the tasks to terminate.

Step 2: Wait for Tasks To Complete

In C# concurrency, you have two approaches to waiting for task competition. The first involves calling the Wait() method on each task in the list:

scraper.cs
foreach (var task in tasks)
{
  task.Wait();
}

Otherwise, use the Task.WhenAll() static method. It returns a new Task that is completed when all the tasks supplied in the list have been completed.

scraper.cs
await Task.WhenAll(tasks);

The await operator suspends execution until the asynchronous operation represented by the resulting Task is completed. For await to work, the enclosing method must be marked with async.

That implies you need to change the signature of the Main method as below:

scraper.cs
public static async Task Main()

async methods must return a Task, as this is how C# represents asynchronous code. The await/async operators are at the heart of asynchronous programming in C#.

Assemble the entire logic, and you'll get the following script:

scraper.cs
using System.Diagnostics;

public class Program
{
  public static async Task Main()
  {
    // to measure the time required by the script
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();

    // URLs of the pages to visit
    var pageURLs = new List<string> {
                "https://scrapeme.live/shop/page/1/",
                "https://scrapeme.live/shop/page/2/",
                "https://scrapeme.live/shop/page/3/",
                "https://scrapeme.live/shop/page/4/",
                "https://scrapeme.live/shop/page/5/"
            };

    // initialize the common HTTP client to make
    // all the requests
    HttpClient client = new HttpClient();

    // initialize the list of tasks
    List<Task> tasks = new List<Task>();

    // define each task and add it to the list
    foreach (var pageURL in pageURLs)
    {
      tasks.Add(Task.Run(() => ProcessRequest(pageURL, client)));
    }

    // wait for all tasks to complete
    await Task.WhenAll(tasks);

    // dispose the HTTP client
    client.Dispose();

    // get the elapsed time in seconds
    stopwatch.Stop();
    double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
    Console.WriteLine($"Elapsed time: {elapsedTimeS}s");
  }

  private static void ProcessRequest(string pageURL, HttpClient client)
  {
    var response = client.GetAsync(pageURL).Result;
    Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
  }
} 

Run it, and it'll print this:

Output
Request to 'https://scrapeme.live/shop/page/2/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/3/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/4/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/5/' completed with status code 'OK'!
Request to 'https://scrapeme.live/shop/page/1/' completed with status code 'OK'!
Elapsed time: 1.848s

Mission completed again! The URLs are not in the same order as in the code, so the execution is parallel. The elapsed time should be comparable to the time got with the thread-based program.

Delaying Tasks

The Task.Delay() method introduces an intentional pause to an asynchronous operation. It's designed to add non-blocking delays within async code to control when the next operation should start.

Here's a simple example of how to use Task.Delay() to introduce a delay of two seconds in your task:

scraper.cs
public async Task DelayedOperationAsync()
{
    // perform some work

    // introduce a 2-second delay
    await Task.Delay(2000);

    // continue with the rest of the operation...
}

Introducing task delays is crucial for several reasons:

  • Resource management: Task delays are useful when dealing with limited resources. They allow you to pace resource-intensive operations to avoid overloading system resources.
  • Orchestrate Operations: Delays can help define the sequence of tasks to run.
  • Dealing with rate limiting: Task delays can help in rate-limiting operations that interact with external services, APIs, or network resources. This prevents overwhelming the target service with too many requests in a short time.

Task Chaining with Continuation

In asynchronous programming, a common use case is to have one task invoke another operation upon completion. That concept is also known as task continuation. Continuations allow descendant operations to consume the results of the antecedent operations.

A continuation task is an asynchronous task that's invoked by another task when the antecedent finishes. Use the ContinueWith() method to chain tasks in C# as in this example:

scraper.cs
// create a task that simulates a math
// asynchronous operation
Task<int> asyncTask = Task.Run(() => {
    // simulate some work
    Task.Delay(2000).Wait();
    return 42;
});

// Handle the completed task
asyncTask.ContinueWith((completedTask) => {
    // get the result from the previous task
    int result = completedTask.Result;
    Console.WriteLine($"Task completed with result: {result}");
});

That script will take more than 2 seconds and result in this output:

Output
Task completed with result: 42

Great job! You've become a C# concurrency ninja!

Difference between Concurrency and Parallelism in C#

It's important to understand the difference between concurrency and parallelism in C#.

Concurrency in C# refers to performing multiple operations in an overlapping way. Those operations require a single thread, which switches from one to the other, giving the illusion of concurrent execution.

Parallelism, on the other hand, involves the execution of multiple operations on multiple threads at the same time. That's true concurrent execution as it involves the simultaneous execution of more processes on different CPUs.

Both concurrency and parallelism are supported by the Task Parallel Library (TPL), a set of public types and APIs for multitasking and multithreading in C#. The TPL handles the partitioning of the work and the scheduling of threads on the thread pool for you.

So, the concurrency vs parallelism C# comparison boils down to the following differences:

  • Concurrency is when two or more operations can start, execute, and complete in overlapping time periods.
  • Parallelism is when operations literally run at the same time, e.g., on a multicore processor.
  • Concurrency needs only one thread, while parallelism needs more than one.
  • C# abstracts concurrency and parallelism through the Task Parallel Library.

Conclusion

This C# concurrency tutorial covered everything you need to know about parallel execution in .NET. You started from the basics and then delved into the more advanced concepts of concurrency in C#.

It isn't easy to build a scalable application based on parallel net requests. Implementing and maintaining it takes time and effort. Plus, the more requests you make in a short time, the more suspicious your script will appear. Parallel scraping without getting blocked isn't a piece of cake.

Avoid all that with ZenRows. As a complete API for web scraping, it offers parallelization capabilities and the most advanced anti-bot toolkit in existence. Perform parallel data scraping via API calls with no effort. Try ZenRows for free!

Frequent Questions

Does C# Have Concurrency?

Yes, C# supports concurrency. In detail, it allows developers to control tasks or operations running concurrently. That's possible thanks to the Task Parallel Library, which enables multithreading and asynchronous programming.

What Are Concurrency Patterns in C#?

Concurrency patterns in C# are established approaches for managing and controlling concurrent tasks. Some common patterns include:

  • Asynchronous Programming: Using async/await for achieving non-blocking execution.
  • Parallel Programming: Employing the Task Parallel Library for parallel execution of Threads and Tasks.
  • Producer-Consumer Pattern: Coordinating tasks that generate data (producers) and tasks that consume it (consumers) using concurrent queues.

Is C# Single-threaded or Multithreaded?

C# is a multithreaded programming language. It provides robust support for creating and managing more than one thread within the same application. That means C# tasks can run on different threads, each on a dedicated CPU.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.