As web security evolves, scraping using open-source tools like Cloudscraper without extra fortifications becomes near impossible. You will be denied access to your desired data, and your IP address may be banned.
To avoid this, you must boost your Cloudscaper-based scraper with extra features, such as proxies.
This guide will walk you through all the steps needed to configure a proxy in Cloudscraper, as well as a tutorial on using rotating and premium proxies.
Let's go!
1. Set Up a Proxy With Cloudscraper
Setting up a Cloudscraper proxy is straightforward. You only need to pass your proxy details as parameters using the proxies
attribute of the request method.
First, import the required library and call the create_scraper()
function. This function returns a Cloudscraper instance, similar to the requests.session
object of the popular Requests library.
# pip3 install cloudscraper
import cloudscraper
# create CloudScraper instance
scraper = cloudscraper.create_scraper()
Next, define your proxy settings and use the proxies
attribute to pass your proxy details as parameters in your request.
#...
# define your proxy
proxy = {
'http': 'http://43.133.59.220:3128',
'https': 'https://43.133.59.220:3128'
}
# make a request using the proxy
response = scraper.get('https://httpbin.io/ip', proxies=proxy)
For illustrative purposes, this example makes a request to HTTPBin using a proxy from the Free Proxy List.
To verify your configuration works, combine the code snippets above and add a print statement to get the final code.
# pip3 install cloudscraper
import cloudscraper
# create CloudScraper instance
scraper = cloudscraper.create_scraper()
# define your proxy
proxy = {
'http': 'http://43.133.59.220:3128',
'https': 'http://43.133.59.220:3128'
}
# make a request using the proxy
response = scraper.get('https://httpbin.io/ip', proxies=proxy)
print(response.text)
Some proxies, mostly free ones, could cause SSL issues. If you encounter SSL errors, try disabling SSL verification by setting the verify
parameter to False
.
# make a request using the proxy
response = scraper.get('https://httpbin.io/ip', proxies=proxy, verify=False)
If done correctly, your result will be the proxy's IP address.
{
"origin": "43.133.59.220:13152"
}
Congratulations! You've successfully added a proxy to your Cloudscraper scraper.
2. Authenticate Proxies
Some proxy providers require authentication details, such as username and password, to regulate access to their proxy servers.
To authenticate a Cloudscraper proxy, include the required credentials in the proxy URL. Here's the format for username and password authentication.
<PROXY_PROTOCOL>://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>
Here's a code example:
import cloudscraper
# create CloudScraper instance
scraper = cloudscraper.create_scraper()
# define your proxy
proxy = {
'http': 'http://<PROXY_USERNAME>:<PROXY_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>',
'https': 'http://<PROXY_USERNAME>:<PROXY_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>'
}
# make a request using the proxy
response = scraper.get('https://httpbin.io/ip', proxies=proxy)
print(response.text)
3. Rotate Proxies
While configuring a Cloudscraper proxy helps you hide your IP address, websites can still block or ban your proxy. This can occur due to excessive requests originating from the same IP address. Anti-bot systems often flag such activities as suspicious, leading to 403 Forbidden errors with Cloudscraper or IP bans.
That's why it's essential to rotate between proxies. You'll be able to distribute traffic across multiple IP addresses, making your requests appear to originate from different users.
Follow the steps below to rotate proxies in Cloudscraper.
First, define your proxy list.
#...
# define a proxy list
proxies_list = [
{'http': 'http://43.133.59.220:3128', 'https': 'http://43.133.59.220:3128'},
{'http': 'http://73.117.183.115:80', 'https': 'http://73.117.183.115:80'},
{'http': 'http://50.174.7.154:80', 'https': 'http://50.174.7.154:80'}
]
We've grabbed a few IPs from the Free Proxy List. If they don't work at the time of reading, grab new ones from the same page.
After that, select a proxy at random using the random.choice()
method. You'll need to import Python's random
module for that.
# ...
import random
# ...
# select a proxy from the list at random
random_proxy = random.choice(proxies_list)
# ...
Lastly, route your request through the selected proxy.
# ...
# make your request using the random proxy
response = scraper.get('https://httpbin.io/ip.com', proxies=random_proxy)
print(response.text)
Put everything together to test your script.
import cloudscraper
import random
# create CloudScraper instance
scraper = cloudscraper.create_scraper()
# define a proxy list
proxies_list = [
{'http': 'http://43.133.59.220:3128', 'https': 'http://43.133.59.220:3128'},
{'http': 'http://73.117.183.115:80', 'https': 'http://73.117.183.115:80'},
{'http': 'http://50.174.7.154:80', 'https': 'http://50.174.7.154:80'}
]
# select a proxy from the list at random
random_proxy = random.choice(proxies_list)
# make your request using the random proxy
response = scraper.get('https://httpbin.io/ip.com', proxies=random_proxy)
print(response.text)
If everything works, you'll get a different IP address for each request. Here are the results for three runs:
# request 1
{
"origin": "50.174.7.154::38289"
}
# request 2
{
"origin": "73.117.183.115:29097"
}
# request 3
{
"origin": "43.133.59.220:34932"
}
Well done!
4. Use Premium Proxies
Although we used free proxies to show you the basic configurations, they're generally unreliable and unsuitable for real-world use cases. They are usually slow, have a short life span, and websites can easily detect and block your requests.
You need premium proxies for consistent performance and to increase your chances of avoiding detection. The best ones, such as ZenRows residential proxies, will significantly increase your success rate and grant you access to heavily protected websites. ZenRows automatically rotates residential proxies under the hood, making it easy for you to disguise your web activity and fly under the radar.
To help you get started, here's a quick guide on how to use ZenRows residential proxies.
Sign up to access your dashboard. Select Residential Proxies in the left menu section and create a new proxy user. You'll be directed to the Proxy Generator page.
Copy your proxy URL for use in your Cloudscraper script. ZenRows allows you to choose between the auto-rotate option and sticky sessions.
Here's the final code using ZenRows residential proxies:
import cloudscraper
# create CloudScraper instance
scraper = cloudscraper.create_scraper()
# define your proxy
proxy = {
'http': 'http://<ZENROWS_PROXY_USERNAME>:<ZENROWS_PROXY_PASSWORD>@superproxy.zenrows.com:1337',
'https': 'http://<ZENROWS_PROXY_USERNAME>:<ZENROWS_PROXY_PASSWORD>@superproxy.zenrows.com:1338'
}
# make a request using the proxy
response = scraper.get('https://httpbin.io/ip', proxies=proxy)
print(response.text)
Since ZenRows automatically rotates proxies under the hood, you'll get a different IP address for each request.
Here's the result for three runs:
# request 1
{
"origin": "72.27.89.127:49528"
}
# request 2
{
"origin": "5.14.169.145:59636"
}
# request 3
{
"origin": "73.234.147.78:41632"
}
However, you should know that proxies alone are not always enough. This is because advanced anti-bot systems employ evolving techniques that can still flag your scraper and block your request.
In such cases, use the ZenRows web scraping API. It comes with the same subscription plan as the premium proxy service!
This web scraping API provides everything you need to bypass any anti-bot system, regardless of complexity. Therefore, if the previous approach fails, the ZenRows web scraping API will definitely bypass Cloudflare and can completely replace Cloudscraper.
Conclusion
Configuring premium proxies can increase your chances of avoiding detection and IP-based restrictions. To get started, you can choose the right fit for your project from this list of the best residential proxies.
However, proxies are not foolproof and can fail against sophisticated anti-bot protection. The only surefire way to never get blocked by Cloudflare is to use a web scraping API, such as ZenRows.