Facing CAPTCHAs can be frustrating, especially when web scraping with Selenium. This is because tools like Selenium often trigger anti-bot systems to display CAPTCHAs, asking you to prove you're human.
But today, you'll learn how to bypass CAPTCHAs with Selenium C# using the following methods.
- Method #1: Use a paid CAPTCHA solver with Selenium C#
- Method #2: Bypass CAPTCHA with a Web Scraping API
Ready? Let's dive in.
Can Selenium in C# Bypass CAPTCHA?
Although CAPTCHA challenges are designed to keep out automated systems, you can still interact with CAPTCHA elements on a webpage using Selenium C#. This allows you to solve them using one of the two approaches below.
The first involves sending the CAPTCHA data to a third-party CAPTCHA-solving service and retrieving the solution.
Alternatively, you can avoid CAPTCHAs altogether. Websites mostly display CAPTCHA challenges when you trigger their anti-bot systems. So, if you can tip-toe your way through, appearing human to the target server, you won't encounter any CAPTCHA challenge. This method is often recommended because of its high success rate.
Let's explore both approaches in detail.
Method #1: Use a Paid CAPTCHA Solver With Selenium C#
Third-party services typically work by outsourcing your CAPTCHA challenges to a human workforce or automatically solving them using advanced algorithms.
This tutorial uses 2captcha, a CAPTCHA-solving service that provides an API endpoint for submitting CAPTCHA challenges and receiving solutions quickly.
2captcha uses a two-step process. First, you send a request containing the CAPTCHA data you want to solve. Then, you poll for the result using the request ID you'll receive as the response from the initial request.
In the case of an audio challenge, your CAPTCHA data would include the base-64 encoded audio file and the language of the audio record.
However, for Google's reCAPTCHA below, you'll need to send the reCAPTCHA site key. This key is a unique identifier for all reCAPTCHAs.
To find your CAPTCHA site key, inspect the webpage in a browser and look for an element with the data-sitekey
attribute.
Before taking another step, you need a 2captcha API key, which you get when you sign up. Also, add the Selenium WebDriver and the 2captcha-C# package to your project by entering the following commands in your terminal.
dotnet add package Selenium.WebDriver
&&
dotnet add package 2captcha-csharp
2captcha-C# is the official C# library for easy integration with the 2captcha API. Instead of manually writing code to make requests to retrieve CAPTCHA IDs and poll for the results using those IDs, this library handles all of that complexity for you.
Once you have these prerequisites, you're ready to write your Selenium CAPTCHA-solving C# script.
So, start by importing the required libraries.
// import the required libraries
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using TwoCaptcha.Captcha;
Next, set up ChromeDriver, navigate to the target website (https://www.google.com/recaptcha/api2/demo
), and create a new TwoCaptcha instance using your API key.
namespace TwoCaptcha
{
public class Scraper
{
static void Main(string[] args)
{
// set up ChromeDriver
IWebDriver driver = new ChromeDriver();
// navigate to the target url
string target_url = "https://www.google.com/recaptcha/api2/demo";
driver.Navigate().GoToUrl(target_url);
// create a new TwoCaptcha instance
TwoCaptcha solver = new TwoCaptcha("YOUR_API_KEY");
}
}
}
Ensure you use the correct namespace (TwoCaptcha) to avoid the error TwoCaptcha is a namespace, and you're using it as a type
.
After that, create a reCAPTCHA instance to represent the challenge you're trying to solve. Then, set the site key and the target URL as in the code snippet below.
namespace TwoCaptcha
{
public class Scraper
{
static void Main(string[] args)
{
//..
// create a new instance of the ReCaptcha class
ReCaptcha captcha = new ReCaptcha();
// set the site key for the ReCaptcha
captcha.SetSiteKey("6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-");
// set the URL of the target webpage where the ReCaptcha is located
captcha.SetUrl(target_url);
}
}
}
Next, call the solver.Solve()
method to solve the reCAPTCHA. This method is responsible for making a post request to retrieve the CAPTCHA ID and polling 2captcha for the solution. Let's also print the solution to see what it looks like.
namespace TwoCaptcha
{
public class Scraper
{
static void Main(string[] args)
{
//...
// declare solution variable
string solution = null;
try
{
// call the Solve method to solve the ReCaptcha
solver.Solve(captcha).Wait();
solution = captcha.Code;
// print the captcha solution
Console.WriteLine("Captcha solved: " + solution);
}
catch (AggregateException e)
{
// if an error occurs during captcha solving, catch the AggregateException and print the error message
Console.WriteLine("Error occurred: " + e.InnerExceptions.First().Message);
}
}
}
}
The code snippet above uses the try-catch
block to handle any exceptions that may occur during the solving process. The InnerExceptions
property of the AggregateException
contains details about the specific exceptions that could occur. In this case, it prints the message of the first inner exception.
The code declares the solution
variable outside the try-catch
block so you can access it later in the code.
All that's left is injecting the solution into the page to submit the solved CAPTCHA to the web server.
For that, inspect the target webpage in a browser to locate the textarea
element with the g-recaptcha-response
ID.
Lastly, select the g-recaptcha-response
element and set its value to the CAPTCHA solution. Then click the submit button, take a screenshot to see your result, and close the browser.
namespace TwoCaptcha
{
public class Scraper
{
static void Main(string[] args)
{
//...
// select g-recaptcha-response element
IWebElement recaptchaResponseElement = driver.FindElement(By.Id("g-recaptcha-response"));
// set the value of selected element to the CAPTCHA solution
((IJavaScriptExecutor)driver).ExecuteScript($"arguments[0].value = '{solution}';", recaptchaResponseElement);
// click the submit button
IWebElement submitButton = driver.FindElement(By.CssSelector("#recaptcha-demo-submit"));
submitButton.Click();
// take a screenshot
((ITakesScreenshot)driver).GetScreenshot().SaveAsFile("screenshot.png");
// close the browser
driver.Quit();
}
}
}
That's it.
Now, let's put everything together for the final code.
// import the required libraries
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using TwoCaptcha.Captcha;
namespace TwoCaptcha
{
public class Scraper
{
static void Main(string[] args)
{
// set up ChromeDriver
IWebDriver driver = new ChromeDriver();
// navigate to the target url
string target_url = "https://www.google.com/recaptcha/api2/demo";
driver.Navigate().GoToUrl(target_url);
// create a new TwoCaptcha instance
TwoCaptcha solver = new TwoCaptcha("YOUR_API_KEY");
// create a new instance of the ReCaptcha class
ReCaptcha captcha = new ReCaptcha();
// set the site key for the ReCaptcha
captcha.SetSiteKey("6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-");
// set the URL of the target webpage where the ReCaptcha is located
captcha.SetUrl(target_url);
// declare solution variable
string solution = null;
try
{
// call the Solve method to solve the ReCaptcha
solver.Solve(captcha).Wait();
solution = captcha.Code;
// print the captcha solution
Console.WriteLine("Captcha solved: " + solution);
}
catch (AggregateException e)
{
// if an error occurs during captcha solving, catch the AggregateException and print the error message
Console.WriteLine("Error occurred: " + e.InnerExceptions.First().Message);
}
// select g-recaptcha-response element
IWebElement recaptchaResponseElement = driver.FindElement(By.Id("g-recaptcha-response"));
// set the value of selected element to the CAPTCHA solution
((IJavaScriptExecutor)driver).ExecuteScript($"arguments[0].value = '{solution}';", recaptchaResponseElement);
// click the submit button
IWebElement submitButton = driver.FindElement(By.CssSelector("#recaptcha-demo-submit"));
submitButton.Click();
// take a screenshot
((ITakesScreenshot)driver).GetScreenshot().SaveAsFile("screenshot.png");
// close the browser
driver.Quit();
}
}
}
Run it, and you should get the following result:
Congrats! You've bypassed your first CAPTCHA with Selenium C#.
Not to spoil the party, but while 2captcha works during testing, it can get expensive and slow down your automation, particularly when large-scale scraping. Plus, it doesn't solve all CAPTCHA types. So, let's explore an alternative that works for every case.
Method #2: Bypass CAPTCHA With a Web Scraping API
As implied earlier, you can avoid CAPTCHAs completely by emulating human browsing behavior.
Selenium can imitate browser interactions, but it has some limitations that make emulating user behavior challenging.
For example, websites can easily detect automation properties like navigator.webdriver
. Plus, it can get slow and resource-intensive, especially when large-scale scraping.
Luckily, ZenRows offers the ultimate alternative, a web scraping API designed to scrape any web page, regardless of CAPTCHA type and complexity. This tool provides the same headless browser functionality as Selenium but without the additional overhead.
Unlike Selenium, ZenRows guarantees CAPTCHA bypass. But don't just take our word for it. Let's prove it.
Here's Selenium against G2, a website that presents a CAPTCHA challenge when you trigger its anti-bot systems.
// import the required libraries
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
namespace TwoCaptcha
{
public class Example
{
static void Main(string[] args)
{
// set up ChromeDriver
IWebDriver driver = new ChromeDriver();
// navigate to the target url
string target_url = "https://www.g2.com/products/asana/reviews";
driver.Navigate().GoToUrl(target_url);
// take a screenshot
((ITakesScreenshot)driver).GetScreenshot().SaveAsFile("screenshot.png");
// close the browser
driver.Quit();
}
}
}
Below is the result
Here, Selenium triggers the page's anti-bot system and is presented with the Turnstile CAPTCHA challenge, which 2captcha cannot solve.
You must avoid this CAPTCHA type completely to bypass it. Below are the steps to achieve that using ZenRows.
To get started, sign up, and you'll get redirected to the Request Builder page.
Paste your target URL (https://www.g2.com/products/asana/reviews
), check the box for Premium Proxies
, and activate the JavaScript Rendering
boost mode.
Then, select a language (C#), and youโll get your script ready to try.
Although the generated code suggests RestSharp, you can use any C# HTTP Client of your choice, such as the built-in HttpClient.
Your code should look like this.
using System;
using System.Net.Http;
namespace TestApplication {
class Test {
static async Task Main(string[] args) {
string apiUrl = "https://api.zenrows.com/v1/?apikey=<YOUR_ZENROWS_API_KEY>&url=https%3A%2F%2Fwww.g2.com%2Fproducts%2Fasana%2Freviews&js_render=true&premium_proxy=true";
using (HttpClient client = new HttpClient()) {
HttpResponseMessage response = await client.GetAsync(apiUrl);
if (response.IsSuccessStatusCode) {
string content = await response.Content.ReadAsStringAsync();
Console.WriteLine(content);
}
else {
Console.WriteLine($"Failed to retrieve data. Status code: {response.StatusCode}");
}
}
}
}
}
Run it, and you'll get the page's HTML.
<!DOCTYPE html>
<!-- ... -->
<title id="icon-label-55be01c8a779375d16cd458302375f4b">G2 - Business Software Reviews</title>
<!-- ... -->
<h1 ...id="main">Where you go for software.</h1>
Awesome, right? That's how easy it is to bypass CAPTCHAs using ZenRows.
Conclusion
CAPTCHAs are annoying web scraping obstacles, but third-party services like 2captcha can help you overcome them. However, your Selenium CAPTCHA bypass script may not work against advanced anti-bot protection. So, consider ZenRows, an all-in-one solution for scraping any website and bypassing any CAPTCHA type.