Concurrency in C# is performing operations in parallel to save time. While powerful, it's also complex, so let's see how to do it step by step in this tutorial.
What Is Concurrency in C#?
Concurrency in C# refers to the ability of an application to run more than one operation at the same time, avoiding the block of the main thread. It allows multiple tasks to progress independently, leading to better performance.
๐ Pros of concurrency:
- Improved performance, with significantly reduced execution time.
- Reduced idle time.
๐ Cons of concurrency:
- Increments resource use, which can lead to system overloads.
- Developing, debugging, and maintaining concurrent code is more challenging.
- Introduces overhead.
- Race conditions can lead to unpredictable behavior.
- Not all methods from the C# standard API are thread-safe.
When to Implement Concurrency
Consider C# concurrency every time your application needs to handle many I/O operations. You can generally run file I/O actions, network requests, and database queries in parallel to improve performance.
Another scenario where concurrency is beneficial is in applications involving CPU-bound operations, such as heavy computations or data processing. Here, the idea is to divide the load into smaller units and execute them concurrently.
How to Use Concurrency in C#
Let's start with a sequential script and make it work in parallel to learn how to use concurrency in C#.
Step 1: Start with a Non-concurrent Script
Suppose you have the following C# script, which visits 5 web pages sequentially:
public class Program
{
public static void Main()
{
// URLs of the pages to visit
var pageURLs = new List<string> {
"https://www.scrapingcourse.com/ecommerce/page/1/",
"https://www.scrapingcourse.com/ecommerce/page/2/",
"https://www.scrapingcourse.com/ecommerce/page/3/",
"https://www.scrapingcourse.com/ecommerce/page4/",
"https://www.scrapingcourse.com/ecommerce/page5/"
};
// initialize the common HTTP client to make
// all the requests
HttpClient client = new HttpClient();
// perform the requests sequentially
foreach (var pageURL in pageURLs)
{
var response = client.GetAsync(pageURL).Result;
Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
}
// dispose the HTTP client
client.Dispose();
}
}
Run it:
dotnet run
It'll produce the following output. The C# program executes each HTTP request to the specified URL in sequence and in the expected order.
Request to 'https://www.scrapingcourse.com/ecommerce/page/1/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/2/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/3/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/4/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/5/' completed with status code 'OK'!
Now, add the following logic to measure the code execution time:
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
// operation to measure the time
stopwatch.Stop();
double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
Console.WriteLine($"Elapsed time: {elapsedTimeS}s");
Stopwatch
is a C# standard class that exposes methods and properties to accurately measure elapsed time. It comes from System.Diagnostics
, so you'll also need the import below:
using System.Diagnostics;
Put it all together:
using System.Diagnostics;
public class Program
{
public static void Main()
{
// to measure the time required by the script
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
// URLs of the pages to visit
var pageURLs = new List<string> {
"https://www.scrapingcourse.com/ecommerce/page/1/",
"https://www.scrapingcourse.com/ecommerce/page/2/",
"https://www.scrapingcourse.com/ecommerce/page/3/",
"https://www.scrapingcourse.com/ecommerce/page/4/",
"https://www.scrapingcourse.com/ecommerce/page/5/"
};
// initialize the common HTTP client to make
// all the requests
HttpClient client = new HttpClient();
// perform the requests sequentially
foreach (var pageURL in pageURLs)
{
var response = client.GetAsync(pageURL).Result;
Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
}
// dispose the HTTP client
client.Dispose();
// get the elapsed time in seconds
stopwatch.Stop();
double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
Console.WriteLine($"Elapsed time: {elapsedTimeS}s");
}
}
Launch the script, and you'll notice this extra message:
...
Elapsed time: 4.721s
Great! That means each request takes around one second because the scraper visited five pages.
But what if you want the script to make the requests in parallel? Read on to learn how!
Step 2: Create Threads for Concurrent Requests
You now want each HTTP request to get executed by an independent thread. The first step is to isolate the request execution logic into a function:
private static void ProcessRequest(HttpClient client, string pageURL)
{
var response = client.GetAsync(pageURL).Result;
Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
}
Then, you need to create a list of threads. Each one will execute an HTTP request through the ProcessRequest()
function.
List<Thread> threads = new List<Thread>();
Several C# threads can share the same HttpClient
instance, as its methods are thread-safe.
In C#, the constructor for the Thread
class accepts a function that represents the task to be executed on the thread. Iterate over each page URL, create a new Thread
, and add it to the list:
foreach (var pageURL in pageURLs)
{
Thread thread = new Thread(() =>
{
ProcessRequest(client, pageURL);
});
threads.Add(thread);
}
Awesome! The list threads
now contains a list of processes ready to be run.
This is the current code of your new C# concurrency script:
using System.Diagnostics;
public class Program
{
public static void Main()
{
// to measure the time required by the script
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
// URLs of the pages to visit
var pageURLs = new List<string> {
"https://www.scrapingcourse.com/ecommerce/page/1/",
"https://www.scrapingcourse.com/ecommerce/page/2/",
"https://www.scrapingcourse.com/ecommerce/page/3/",
"https://www.scrapingcourse.com/ecommerce/page/4/",
"https://www.scrapingcourse.com/ecommerce/page/5/"
};
// initialize the common HTTP client to make
// all the requests
HttpClient client = new HttpClient();
// initialize the list of threads
List<Thread> threads = new List<Thread>();
// define each thread and add it to the list
foreach (var pageURL in pageURLs)
{
Thread thread = new Thread(() =>
{
ProcessRequest(client, pageURL);
});
threads.Add(thread);
}
// launch the threads...
// dispose the HTTP client
client.Dispose();
// get the elapsed time in seconds
stopwatch.Stop();
double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
Console.WriteLine($"Elapsed time: {elapsedTimeS}s");
}
private static void ProcessRequest(HttpClient client, string pageURL)
{
var response = client.GetAsync(pageURL).Result;
Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
}
}
Well done! It's time to fire the threads you just defined.
Step 3: Start All Threads
To start a Thread
, call its Start()
method. That instructs the OS to change the state of the current thread instance to "Running".
Iterate over each thread in the threads
list and launch them all:
foreach (var thread in threads)
{
thread.Start();
}
When you call thread.Start()
, this is what happens behind the scenes:
- Thread creation: The operating system allocates a new thread to its thread pool.
- Context switch: The operating system performs a context switch to the newly created thread. That means the CPU switches its focus from the currently executing thread to the new thread. The state of the new thread is loaded into the CPU registers, and its execution begins.
- Thread logic execution: The CPU starts executing the logic defined in the function referenced by the thread object.
Those steps occur for each thread, leading to parallel execution of the functions passed to the threads. In other words, the script will perform the five HTTP requests in parallel.
Perfect! Each thread is now running.
Keep in mind that C# threads are individual units of execution that run concurrently with the main application thread. When a new thread starts, it operates independently from the main thread. Thus, the main thread doesn't automatically wait for it to complete.
This independence is what enables parallelism. It also implies that you must explicitly wait for a each thread to finish its execution. Learn how in the next step!
Step 4: Wait for Threads To Finish
The Join()
method instructs the C# program to block the execution until the thread instance terminates. Call it on each thread to wait for them to complete:
foreach (var thread in threads)
{
thread.Join();
}
After calling Join()
, the OS keeps executing the thread until it reaches the end of its function or raises an exception. Once the thread terminates, the OS releases its resources and the C# script can continue.
In this case, the C# program will wait for all threads to end before moving on to the next line of code.
Congrats! You just implemented concurrency in C#. It's time to try the script.
Step 5: Launch the C# Concurrency Script
This is the complete thread-based C# script:
using System.Diagnostics;
public class Program
{
public static void Main()
{
// to measure the time required by the script
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
// URLs of the pages to visit
var pageURLs = new List<string> {
"https://www.scrapingcourse.com/ecommerce/page/1/",
"https://www.scrapingcourse.com/ecommerce/page/2/",
"https://www.scrapingcourse.com/ecommerce/page/3/",
"https://www.scrapingcourse.com/ecommerce/page/4/",
"https://www.scrapingcourse.com/ecommerce/page/5/"
};
// initialize the common HTTP client to make
// all the requests
HttpClient client = new HttpClient();
// initialize the list of threads
List<Thread> threads = new List<Thread>();
// define each thread and add it to the list
foreach (var pageURL in pageURLs)
{
Thread thread = new Thread(() =>
{
ProcessRequest(client, pageURL);
});
threads.Add(thread);
}
// launch each thread
foreach (var thread in threads)
{
thread.Start();
}
// wait for all threads to complete
foreach (var thread in threads)
{
thread.Join();
}
// dispose the HTTP client
client.Dispose();
// get the elapsed time in seconds
stopwatch.Stop();
double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
Console.WriteLine($"Elapsed time: {elapsedTimeS}s");
}
private static void ProcessRequest(HttpClient client, string pageURL)
{
var response = client.GetAsync(pageURL).Result;
Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
}
}
Launch it, and you'll get an output similar to this:
Request to 'https://www.scrapingcourse.com/ecommerce/page/3/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/2/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/4/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/5/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/1/' completed with status code 'OK'!
Elapsed time: 1.676s
The order of the requests is no longer sequential, so the script runs in parallel.
Each time you run the script, you'll get a different order because it depends on which thread terminates first.
The total execution time is slightly longer than 1 second, which makes sense as each request takes about 1 second. The requests are now made in parallel, and the execution time is:
Time to execute the slowest request + Time to handle threads
Don't forget that creating and controlling threads comes at a cost in terms of time and resources. Using them is beneficial only when the time saved is greater than the overhead introduced. Here, we got a ~3x time improvement, which more than justifies thread use!
Advanced C# Thread Handling With ThreadPool
The optimal number of threads to use depends on the CPU cores available, task types, and other factors. As you can imagine, determining it is complex, but manually opening threads without taking them into account can lead to system overload.
The solution is a thread pool, which is a managed collection of threads optimized for short-running tasks. It creates a specific number of threads upfront for you, and then it queues your tasks and efficiently reuses threads upon task completion.
That approach minimizes the overhead of creating and destroying threads for each task, leading to better performance and resource use.
The easiest way to deal with a thread pool in C# is through the ThreadPool
class. Its QueueUserWorkItem()
static method queues a function for execution with one of the threads in the pool:
foreach (var pageURL in pageURLs)
{
ThreadPool.QueueUserWorkItem(_ =>
{
ProcessRequest(client, pageURL);
});
}
The default size of the thread pool depends on several factors, such as how large the virtual address space is. Call the GetMaxThreads()
static method to determine the number of threads in the pool. To change it, use SetMaxThreads()
.
The main issue with ThreadPool
is that it doesn't provide a method to wait for thread execution. As a workaround, you can use a CountdownEvent
object as below. You see a synchronization primitive that is signaled when its internal count reaches zero.
CountdownEvent countdownEvent = new CountdownEvent(pageURLs.Count);
foreach (var pageURL in pageURLs)
{
ThreadPool.QueueUserWorkItem(_ =>
{
ProcessRequest(client, pageURL);
// signal that this task is completed
countdownEvent.Signal();
});
}
// wait for all threads to terminate
countdownEvent.Wait();
Et voilร ! You're now a C# concurrency thread master!
Use Tasks instead of Threads
The task-based asynchronous programming pattern (TAP) is another way to implement concurrency in C#. Instead of using manual threads, it allows you to perform asynchronous operations in tasks.
In C#, a Task
is the core concept of the TAP and represents an asynchronous operation. It accepts a function that represents the asynchronous logic to perform. Under the hood, C# executes tasks asynchronously on the thread pool.
The benefits of using tasks over threads in C# are:
- Higher abstraction: Tasks work on top of threads and provide a higher-level abstraction for managing async operations.
- Improved readability: Asynchronous logic is more readable and easier to understand than concurrent code involving threads.
- Operation chaining: Tasks make it easier to chain operations and specify what should happen next when a task terminates.
Let's now see how to use Task
s to build a concurrent script in C#!
Step 1: Define and Start the Tasks
As you did before with threads, the first step is to isolate the task logic in a new function:
private static void ProcessRequest(string pageURL, HttpClient client)
{
var response = client.GetAsync(pageURL).Result;
Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
}
Create a list of Task
objects, iterate over it, and populate it with new tasks. Task.Run()
is a static method that transforms a function into a Task
and queues it on the thread pool for execution. If a thread in the pool is empty, the task will be run immediately:
// initialize the list of tasks
List<Task> tasks = new List<Task>();
// define each task and add it to the list
foreach (var pageURL in pageURLs)
{
tasks.Add(Task.Run(() => ProcessRequest(pageURL, client)));
}
Your current script will be:
using System.Diagnostics;
public class Program
{
public static void Main()
{
// to measure the time required by the script
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
// URLs of the pages to visit
var pageURLs = new List<string> {
"https://www.scrapingcourse.com/ecommerce/page/1/",
"https://www.scrapingcourse.com/ecommerce/page/2/",
"https://www.scrapingcourse.com/ecommerce/page/3/",
"https://www.scrapingcourse.com/ecommerce/page/4/",
"https://www.scrapingcourse.com/ecommerce/page/5/"
};
// initialize the common HTTP client to make
// all the requests
HttpClient client = new HttpClient();
// initialize the list of tasks
List<Task> tasks = new List<Task>();
// define each task and add it to the list
foreach (var pageURL in pageURLs)
{
tasks.Add(Task.Run(() => ProcessRequest(pageURL, client)));
}
// wait for tasks to complete...
// dispose the HTTP client
client.Dispose();
// get the elapsed time in seconds
stopwatch.Stop();
double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
Console.WriteLine($"Elapsed time: {elapsedTimeS}s");
}
private static void ProcessRequest(string pageURL, HttpClient client)
{
var response = client.GetAsync(pageURL).Result;
Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
}
}
Wonderful! It only remains to wait for the tasks to terminate.
Step 2: Wait for Tasks To Complete
In C# concurrency, you have two approaches to waiting for task competition. The first involves calling the Wait()
method on each task in the list:
foreach (var task in tasks)
{
task.Wait();
}
Otherwise, use the Task.WhenAll()
static method. It returns a new Task
that is completed when all the tasks supplied in the list have been completed.
await Task.WhenAll(tasks);
The await
operator suspends execution until the asynchronous operation represented by the resulting Task
is completed. For await
to work, the enclosing method must be marked with async
.
That implies you need to change the signature of the Main
method as below:
public static async Task Main()
async
methods must return a Task
, as this is how C# represents asynchronous code. The await/async
operators are at the heart of asynchronous programming in C#.
Assemble the entire logic, and you'll get the following script:
using System.Diagnostics;
public class Program
{
public static async Task Main()
{
// to measure the time required by the script
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
// URLs of the pages to visit
var pageURLs = new List<string> {
"https://www.scrapingcourse.com/ecommerce/page/1/",
"https://www.scrapingcourse.com/ecommerce/page/2/",
"https://www.scrapingcourse.com/ecommerce/page/3/",
"https://www.scrapingcourse.com/ecommerce/page/4/",
"https://www.scrapingcourse.com/ecommerce/page/5/"
};
// initialize the common HTTP client to make
// all the requests
HttpClient client = new HttpClient();
// initialize the list of tasks
List<Task> tasks = new List<Task>();
// define each task and add it to the list
foreach (var pageURL in pageURLs)
{
tasks.Add(Task.Run(() => ProcessRequest(pageURL, client)));
}
// wait for all tasks to complete
await Task.WhenAll(tasks);
// dispose the HTTP client
client.Dispose();
// get the elapsed time in seconds
stopwatch.Stop();
double elapsedTimeS = stopwatch.ElapsedMilliseconds / 1000.0;
Console.WriteLine($"Elapsed time: {elapsedTimeS}s");
}
private static void ProcessRequest(string pageURL, HttpClient client)
{
var response = client.GetAsync(pageURL).Result;
Console.WriteLine($"Request to '{pageURL}' completed with status code '{response.StatusCode}'!");
}
}
Run it, and it'll print this:
Request to 'https://www.scrapingcourse.com/ecommerce/page/2/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/3/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/4/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/5/' completed with status code 'OK'!
Request to 'https://www.scrapingcourse.com/ecommerce/page/1/' completed with status code 'OK'!
Elapsed time: 1.848s
Mission completed again! The URLs are not in the same order as in the code, so the execution is parallel. The elapsed time should be comparable to the time got with the thread-based program.
Delaying Tasks
The Task.Delay()
method introduces an intentional pause to an asynchronous operation. It's designed to add non-blocking delays within async code to control when the next operation should start.
Here's a simple example of how to use Task.Delay()
to introduce a delay of two seconds in your task:
public async Task DelayedOperationAsync()
{
// perform some work
// introduce a 2-second delay
await Task.Delay(2000);
// continue with the rest of the operation...
}
Introducing task delays is crucial for several reasons:
- Resource management: Task delays are useful when dealing with limited resources. They allow you to pace resource-intensive operations to avoid overloading system resources.
- Orchestrate Operations: Delays can help define the sequence of tasks to run.
- Dealing with rate limiting: Task delays can help in rate-limiting operations that interact with external services, APIs, or network resources. This prevents overwhelming the target service with too many requests in a short time.
Task Chaining with Continuation
In asynchronous programming, a common use case is to have one task invoke another operation upon completion. That concept is also known as task continuation. Continuations allow descendant operations to consume the results of the antecedent operations.
A continuation task is an asynchronous task that's invoked by another task when the antecedent finishes. Use the ContinueWith()
method to chain tasks in C# as in this example:
// create a task that simulates a math
// asynchronous operation
Task<int> asyncTask = Task.Run(() => {
// simulate some work
Task.Delay(2000).Wait();
return 42;
});
// Handle the completed task
asyncTask.ContinueWith((completedTask) => {
// get the result from the previous task
int result = completedTask.Result;
Console.WriteLine($"Task completed with result: {result}");
});
That script will take more than 2 seconds and result in this output:
Task completed with result: 42
Great job! You've become a C# concurrency ninja!
Difference between Concurrency and Parallelism in C#
It's important to understand the difference between concurrency and parallelism in C#.
Concurrency in C# refers to performing multiple operations in an overlapping way. Those operations require a single thread, which switches from one to the other, giving the illusion of concurrent execution.
Parallelism, on the other hand, involves the execution of multiple operations on multiple threads at the same time. That's true concurrent execution as it involves the simultaneous execution of more processes on different CPUs.
Both concurrency and parallelism are supported by the Task Parallel Library (TPL), a set of public types and APIs for multitasking and multithreading in C#. The TPL handles the partitioning of the work and the scheduling of threads on the thread pool for you.
So, the concurrency vs parallelism C# comparison boils down to the following differences:
- Concurrency is when two or more operations can start, execute, and complete in overlapping time periods.
- Parallelism is when operations literally run at the same time, e.g., on a multicore processor.
- Concurrency needs only one thread, while parallelism needs more than one.
- C# abstracts concurrency and parallelism through the Task Parallel Library.
Conclusion
This C# concurrency tutorial covered everything you need to know about parallel execution in .NET. You started from the basics and then delved into the more advanced concepts of concurrency in C#.
It isn't easy to build a scalable application based on parallel net requests. Implementing and maintaining it takes time and effort. Plus, the more requests you make in a short time, the more suspicious your script will appear. Parallel scraping without getting blocked isn't a piece of cake.
Avoid all that with ZenRows. As a complete API for web scraping, it offers parallelization capabilities and the most advanced anti-bot toolkit in existence. Perform parallel data scraping via API calls with no effort. Try ZenRows for free!
Frequent Questions
Does C# Have Concurrency?
Yes, C# supports concurrency. In detail, it allows developers to control tasks or operations running concurrently. That's possible thanks to the Task Parallel Library, which enables multithreading and asynchronous programming.
What Are Concurrency Patterns in C#?
Concurrency patterns in C# are established approaches for managing and controlling concurrent tasks. Some common patterns include:
-
Asynchronous Programming: Using
async/await
for achieving non-blocking execution. -
Parallel Programming: Employing the Task Parallel Library for parallel execution of
Thread
s andTask
s. - Producer-Consumer Pattern: Coordinating tasks that generate data (producers) and tasks that consume it (consumers) using concurrent queues.
Is C# Single-threaded or Multithreaded?
C# is a multithreaded programming language. It provides robust support for creating and managing more than one thread within the same application. That means C# tasks can run on different threads, each on a dedicated CPU.